image – Max的程式語言筆記

zi2zi 訓練資料去除空白區域對訓練的影響

max-stackoverflow — Wed, 19 Feb 2025 07:19:06 +0000

在 zi2zi-pytorch 的版本, 有分別把 source font 與 target font 把空白區域移掉後, 再進行訓練, 理論上是更有效率沒錯, 排除掉空白區域, 在相同的解析度下, 是可以取得更多細節, 實際上以為會遇到縮放的問題, 實際測雖然訓練的資料是放大的, 在推論時輸入的資料是縮小的, 實際上推論結果是正確的大小.

以這個例子來說, 藍色是被放大的訓練資料, 紅色是實際要學習的 target font. 黑色是推論結果.

排除掉空白區域

優點: 可以處理部份文字會在框線外的問題,
缺點: 是會失去空間資訊, 也可能遇到 source font 與 target font 比例互相沖突的字, 但機率非常非常的低, 例如: 作者心血來潮, 把某一個特徵(規則)之外讓字填滿可視區域, 但在思源黑體這個字, 只占畫面中宮的一小部份.

ImageFilter.GaussianBlur(radius=sigma) 對訓練資料的影響

max-stackoverflow — Sun, 16 Feb 2025 15:18:06 +0000

在 zi2zi pytorch 裡, 有 20% 的機率會對訓練資料套用 blur 效果, 接著有 3種radius 分別是 [1, 1.5, 2], 有 1/15 機率讓學習資料套用到 blur + radius=2, 這個情況對資料有什麼影響? 用這張圖來說明:

說明:

左圖裡的2個小圖, 分別是ZenMaruGothic-Regular style 與思源黑體 demi-light style.
left 是 80% 機率, 不做 blur,
right, 約 7% 機率 blur 產生的暈染效果, 讓原本沒有交叉的線條相連, L 形的直角變成圓角.
blur 優點就是反鋸齒, 缺點就是暈染.

程式碼: dataset.py

if self.blur and random.random() > 0.8:
    sigma_list = [1, 1.5, 2]
    sigma = random.choice(sigma_list)
    img_A = img_A.filter(ImageFilter.GaussianBlur(radius=sigma))
    img_B = img_B.filter(ImageFilter.GaussianBlur(radius=sigma))

除此之外, blur 還會讓圓形筆觸有可能讓筆頭的弧度變的扁平, 例如, 原本可能是 round=10 的曲度, 有可能變成 round=8

檔案轉換為svg比較

max-stackoverflow — Wed, 25 May 2022 18:39:39 +0000

原始的圖片：

轉成點陣圖，感覺是滿正常的：

再轉回向量，如果是使用 potrace:
http://potrace.sourceforge.net/#downloading

測試了調整 potrace 的參數：

Algorithm options:
 -z, --turnpolicy   - how to resolve ambiguities in path decomposition
 -t, --turdsize          - suppress speckles of up to this size (default 2)
 -a, --alphamax          - corner threshold parameter (default 1)
 -n, --longcurve            - turn off curve optimization
 -O, --opttolerance      - curve optimization tolerance (default 0.2)
 -u, --unit              - quantize output to 1/unit pixels (default 10)
 -d, --debug             - produce debugging output of type n (n=1,2,3)

但都無法消除掉上面圖中造成的一個鈍角。

改用 VTracer:
https://github.com/visioncortex/vtracer

是解決掉較小的圓角時，會造成鈍角的問題，但也造成較大的曲線會造成新的鈍角：

暫時的解法，是使用肉眼先檢查一次，在FontForge 裡合併多個節點，就會變圓滑。

除了 potrace, 還可以使用 autotrace, 下載用網址：
https://github.com/autotrace/autotrace/releases

在 Mac 先使用 terminal 建立 symbol link:

ln -s /Applications/autotrace.app/Contents/MacOS/autotrace /usr/local/bin

接著就可以把 bmp 檔案轉成 svg

autotrace  --output-format svg --output-file output_filename.svg  input_filename.bmp

測試結果，會吐血，potrace 產生的檔案 11KB, autotrace 產生出 270KB檔案，使用繪圖軟體開啟 .svg 檔案，似乎沒問題，但用 FontForge 開啟 .svg 滿滿的節點。@_@；

VTracer

這是目前非常推薦的開源選擇，採用 Rust 語言編寫。它最大的特色是支援彩色影像追蹤，效果比 Potrace 豐富。它有網頁版可以直接上傳檔案測試，也有命令列工具可以整合到程式流程。它的線條處理相對平滑，適合處理標誌或插畫。

AutoTrace

這是一個歷史悠久的專案，功能比 Potrace 多。它支援彩色追蹤，並且可以設定更多參數來調整曲線的精密度。不過這個專案更新頻率較低，在某些現代系統上的安裝過程可能會比較繁瑣。如果你覺得 Potrace 轉出來的線條太生硬，可以試試這款的演算法。

Inkscape 內建工具

Inkscape 是一款免費的向量繪圖軟體，它其實內置了點陣圖轉向量的功能。在選單中找到路徑下的描繪點陣圖即可。它背後整合了 Potrace，但也加入了多層掃描功能來處理彩色影像。對於不習慣寫指令的人來說，這是最直觀的替代方案。

Vector Magic

這是一個付費的商業軟體，也有提供線上轉換服務。雖然要收費，但它的自動識別與細節保留能力通常優於開源工具。它能自動清理雜訊並處理反鋸齒邊緣，產出的 SVG 路徑非常乾淨。如果你對精準度有極高要求，且預算允許，這通常是業界公認最強的方案。

程式開發庫選項

如果你是在寫程式，Python 有一個叫做 osra 的庫可以參考。另外還有一些基於深度學習的方案在 GitHub 上出現，例如利用 AI 進行向量化，但這類工具通常運算資源需求較高。

Python PIL: How to save cropped image?

max-stackoverflow — Wed, 09 Jun 2021 18:13:52 +0000

需要取得圖片中的某一個區塊，想要剪裁圖片的範例如下：

from PIL import Image

imageObject = Image.open(saved_image_path)
box = (x_offset, Y_offset, width, height)
crop = imageObject.crop(box)
crop.save(saved_image_path, format)

資料來源：
https://stackoverflow.com/questions/6456479/python-pil-how-to-save-cropped-image

Python Pillow – Cropping an Image
https://www.tutorialspoint.com/python_pillow/python_pillow_cropping_an_image.htm

[PyTorch] output with shape [1, 28, 28] doesn’t match the broadcast shape [3, 28, 28]

max-stackoverflow — Thu, 03 Jun 2021 20:42:56 +0000

會產生這一個錯誤，原因是圖片的格式造成，是彩色還是灰階（黑白）圖片。

Let me clarify, if the img has three channels, you should have three number for mean, for example, img is RGB, mean is [0.5, 0.5, 0.5], the normalize result is R * 0.5, G * 0.5, B * 0.5. If img is grey type that only one channel, so mean should be [0.5], the normalize result is R * 0.5

資料來源：
https://github.com/yunjey/pytorch-tutorial/issues/161

解法很多，需要配合程式變成彩色，或修改程式的流程。如果是在python 裡使用 PIL 讀取圖片，轉換方式如下：

img = Image.open(GIF_FILENAME)
rgbimg = Image.new("RGBA", img.size)
rgbimg.paste(img)
rgbimg.save('foo.jpg')

Numpy 基礎操作 reshape

max-stackoverflow — Wed, 01 Apr 2020 09:52:32 +0000

在處理圖片時，針對某一個區塊的內容想用來判斷，傳統的做法（例如：寫COBOL語言的世界）是自己寫迴圈，把要輸出的內容先Queue住，自己處理分段。現在有了numpy 居然一個 reshape 就做完了。@_@；好方便。

執行結果比較：

上面是自己做中斷，每6筆為一行。

下面是用 numpy 一行指令，把原本的資料切開。取得的結果是一樣的。

舊的寫法：

row = ""
idx=0
values_formated = []
for item in values:
    idx += 1
    flag = 1
    if item[0]==0:
        flag = 0
    row += " %d" % (flag)
    values_formated.append(flag)
    if idx % (diff_x+1) == 0:
        print((diff_x+1), row)
        row = ""

新的寫法：

np_re = np.array(values_formated)
np2 = np_re.reshape([5,6])
print("np2:", np2)

處理圖片中的多角形

max-stackoverflow — Wed, 01 Apr 2020 02:52:29 +0000

原本的需求是想知道圖片中一個4角形的內容值，剛好看到一個實用的範例。似乎 OpenCV 和 numpy 變成必學的項目了，好多專案都在使用。

How to get pixel values inside of a rectangle within an image
https://stackoverflow.com/questions/58790535/how-to-get-pixel-values-inside-of-a-rectangle-within-an-image

You can do that with Python/OpenCV by first drawing a white filled polygon on black background as a mask from your four points. The use np.where to locate and then print all the points in the image corresponding to the white pixels in the mask.

Input:

import cv2
import numpy as np

# read image
image = cv2.imread('lena.png')

# create mask with zeros
mask = np.zeros((image.shape), dtype=np.uint8)

# define points (as small diamond shape)
pts = np.array( [[[25,20],[30,25],[25,30],[20,25]]], dtype=np.int32 )
cv2.fillPoly(mask, pts, (255,255,255) )

# get color values
values = image[np.where((mask == (255,255,255)).all(axis=2))]
print(values)

# save mask
cv2.imwrite('diamond_mask.png', mask)

cv2.imshow('image', image)
cv2.imshow('mask', mask)
cv2.waitKey()

Mask:

Results:

 [[108 137 232]
 [104 134 232]
 [108 136 231]
 [106 134 231]
 [109 133 228]
 [108 136 229]
 [109 137 230]
 [110 135 232]
 [103 126 230]
 [112 134 228]
 [114 136 228]
 [111 138 230]
 [110 137 233]
 [103 135 234]
 [103 126 230]
 [101 120 226]
 [108 137 230]
 [112 133 228]
 [114 136 227]
 [115 139 232]
 [112 137 232]
 [105 134 233]
 [102 128 232]
 [ 98 119 226]
 [ 93 105 220]
 [108 139 230]
 [110 137 230]
 [112 135 230]
 [113 135 230]
 [111 138 231]
 [112 139 232]
 [109 134 233]
 [101 128 232]
 [100 120 224]
 [ 90 104 221]
 [ 87  95 211]
 [111 138 229]
 [109 135 231]
 [109 136 230]
 [113 141 233]
 [110 139 233]
 [105 136 234]
 [101 127 232]
 [ 95 117 225]
 [ 90 107 220]
 [110 137 231]
 [110 138 231]
 [107 140 236]
 [110 139 233]
 [104 135 234]
 [105 130 231]
 [ 92 116 227]
 [114 141 234]
 [112 142 235]
 [111 140 235]
 [111 138 234]
 [110 132 232]
 [114 140 234]
 [108 140 233]
 [107 134 233]
 [107 140 235]]

換成 PIL 寫法：

How to read an image file as ndarray

from PIL import Image
import numpy as np

im = np.array(Image.open('data/src/lena_square.png'))

print(im.dtype)
# uint8

print(im.ndim)
# 3

print(im.shape)
# (512, 512, 3)