訓練 Stable Diffusion LoRA 讓 AI 學習 Zen Maru Gothic 風格

2025-02-262025-02-27

要建立 datasets/zenmaru_dataset 來訓練 Stable Diffusion LoRA 讓 AI 學習 Zen Maru Gothic 風格，請按照以下詳細步驟來準備數據集。

📌 1. 確定數據集的要求

圖像格式：PNG / JPG（建議 PNG，避免壓縮失真）
圖片尺寸：512×512 或 1024×1024（保持統一）
圖片數量：至少 200~500 張（越多效果越好）
內容：
- Zen Maru Gothic 字體的單字圖片
- 不同大小、粗細的變化
- 對應的文字標籤

📌 2. 下載 Zen Maru Gothic 字體

你可以從 Google Fonts 或其他來源下載：

Google Fonts 👉 Zen Maru Gothic
其他開源字體網站

下載後，將字體 .ttf 文件儲存，例如：

/fonts/ZenMaruGothic-Regular.ttf
/fonts/ZenMaruGothic-Bold.ttf

📌 3. 批量生成字體圖像

我們需要從 Zen Maru Gothic 生成字體圖片，這樣才能讓 AI 學習字體風格。

🔹 使用 Python 產生字體圖片

以下 Python 程式可以 自動生成 Zen Maru Gothic 的字體圖片，並存入 datasets/zenmaru_dataset 資料夾。

from PIL import Image, ImageDraw, ImageFont
import os

# 設定字體檔案 & 生成資料夾
font_path = "fonts/ZenMaruGothic-Regular.ttf"  # 你的 Zen Maru Gothic 字體
output_dir = "datasets/zenmaru_dataset"
os.makedirs(output_dir, exist_ok=True)

# 需要生成的文字（可以加更多）
characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz一二三四五六七八九十"

# 生成圖片
for char in characters:
    img = Image.new("RGB", (512, 512), (255, 255, 255))  # 背景白色
    draw = ImageDraw.Draw(img)
    
    try:
        font = ImageFont.truetype(font_path, 400)  # 字體大小
        text_width, text_height = draw.textsize(char, font=font)
        draw.text(((512-text_width)/2, (512-text_height)/2), char, font=font, fill=(0, 0, 0))  # 置中
    except Exception as e:
        print(f"無法處理 {char}: {e}")
        continue

    img.save(os.path.join(output_dir, f"{char}.png"))  # 儲存圖像

📂 產出的檔案結構

datasets/
  ├── zenmaru_dataset/
  │   ├── A.png
  │   ├── B.png
  │   ├── C.png
  │   ├── 一.png
  │   ├── 二.png
  │   ├── ...

📌 4. 準備 Captions 標籤

Stable Diffusion LoRA 需要 對應的 caption（標籤） 來幫助 AI 理解圖像內容。

🔹 方式 1：手動建立 `.txt` 標籤

每張圖片需要有相對應的 .txt 文件，例如：

datasets/zenmaru_dataset/
  ├── A.png
  ├── A.txt  (內文: "Zen Maru Gothic letter A")
  ├── 一.png
  ├── 一.txt (內文: "Zen Maru Gothic style character 一")

你可以用 Python 自動生成 .txt 標籤：

characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz一二三四五六七八九十"

for char in characters:
    with open(f"datasets/zenmaru_dataset/{char}.txt", "w", encoding="utf-8") as f:
        f.write(f"Zen Maru Gothic style character {char}")

📌 5. 確保數據集格式正確

檢查以下條件： ✅ 所有圖像為 PNG 或 JPG
✅ 圖片大小統一（512×512 或 1024×1024）
✅ 每張圖片有對應的 .txt 標籤
✅ 確保文件名與內容一致（A.png 對應 A.txt）

📌 6. 確認數據集完整後，即可開始 LoRA 訓練

現在你就可以執行 LoRA 訓練指令：

python train_db.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" \
  --instance_data_dir="datasets/zenmaru_dataset" \
  --output_dir="output/zenmaru_lora" \
  --resolution=512 \
  --train_batch_size=2 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-4 \
  --max_train_steps=2000 \
  --lr_scheduler="cosine" \
  --lora_rank=32 \
  --use_8bit_adam

🎯 最終結果

你現在已經： ✅ 下載 Zen Maru Gothic 字體
✅ 使用 Python 產生字體圖片
✅ 建立對應的 .txt 標籤
✅ 準備好 LoRA 訓練數據集

這樣就能讓 Stable Diffusion LoRA 學習 Zen Maru Gothic 風格！🚀

下載的 github repo 之中,沒有 train_db.py 檔案

train_db.py 不是官方內建的指令，它通常指的是 DreamBooth 訓練腳本，但 Stable Diffusion 官方並未提供 train_db.py。如果你想訓練 LoRA，應該使用 kohya-ss 的 LoRA 訓練腳本，或者自己撰寫 diffusers 版的 LoRA 訓練程式。

📌 1. 使用 `kohya-ss` 訓練 LoRA

如果你想使用 LoRA 訓練，推薦使用 kohya-ss 的 train_network.py（這是目前最常見的 LoRA 訓練腳本）。

🔹 安裝 `kohya-ss`

git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
pip install -r requirements.txt

🔹 使用 `train_network.py` 訓練 LoRA

python train_network.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" \
  --train_data_dir="datasets/zenmaru_dataset" \
  --output_dir="output/zenmaru_lora" \
  --resolution=512,512 \
  --network_dim=32 \
  --learning_rate=1e-4 \
  --train_batch_size=2 \
  --max_train_steps=2000 \
  --lr_scheduler="cosine" \
  --use_8bit_adam \
  --save_every_n_steps=500

🔍 參數說明

--train_data_dir 👉 你的 Zen Maru Gothic 圖像資料夾（即 datasets/zenmaru_dataset）
--network_dim=32 👉 LoRA 的 Rank（一般 16 或 32）
--resolution=512,512 👉 訓練的圖片大小
--train_batch_size=2 👉 訓練批次大小
--learning_rate=1e-4 👉 LoRA 微調的學習率
--max_train_steps=2000 👉 設定訓練步數

📌 2. 如果想用 `diffusers` 來訓練 LoRA

如果你不想使用 kohya-ss，你可以自己寫 Python 腳本來 訓練 LoRA。

🔹 安裝 `diffusers`

pip install diffusers transformers accelerate bitsandbytes safetensors

🔹 自己寫 LoRA 訓練腳本

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

# 讀取 Stable Diffusion 模型
model_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.to("cuda")

# 設定 LoRA 訓練參數
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    target_modules=["q_proj", "v_proj"],
)
pipe.unet = get_peft_model(pipe.unet, config)

# 訓練 LoRA（這部分要再補充完整的 DataLoader）

這種方式可以讓你完全自訂 LoRA 訓練流程。

📌 3. `train_db.py` 從哪來的？

它不是官方的訓練腳本，通常是某些 DreamBooth 相關的 repo（例如 kohya-ss）自製的腳本。
如果你下載的是 某些 WebUI 版本，它可能內建類似的 DreamBooth 訓練腳本。

✅ 總結

如果想訓練 LoRA：
- 推薦使用 kohya-ss 👉 train_network.py
- 指令範例：
  python train_network.py --train_data_dir="datasets/zenmaru_dataset"
  --network_dim=32 ...
如果想自己寫 LoRA 訓練腳本：
- 可以用 diffusers 來微調 Stable Diffusion
train_db.py 並非官方腳本，請確認你的 repo 是否包含這個腳本。

這裡是 基於 diffusers 的 LoRA 訓練腳本，適用於 Stable Diffusion 2.1，並可用於你的 Zen Maru Gothic 風格訓練。
這個腳本支援 DreamBooth 方式訓練，並使用 peft 來實現 LoRA（Low-Rank Adaptation）。

📌 LoRA 訓練腳本（基於 diffusers）

需求：

Python 3.10 或以上
CUDA（NVIDIA GPU）
Hugging Face diffusers + peft
Zen Maru Gothic 訓練數據集

主要特性：

基於 Hugging Face diffusers 框架
使用 LoRA 降低顯存需求
使用 AdamW 優化器
支援 fp16 訓練（混合精度）
保存 LoRA 權重，可直接在 WebUI 使用

📜 完整 LoRA 訓練腳本

import torch
from torch import nn
from torch.optim import AdamW
from diffusers import StableDiffusionPipeline, UNet2DConditionModel
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
from transformers import CLIPTokenizer, CLIPTextModel
import os
from PIL import Image
from torchvision import transforms
from tqdm import tqdm
from diffusers import DDPMScheduler

# 初始化 scheduler
noise_scheduler = DDPMScheduler(num_train_timesteps=1000)

# -------------------------
# 🔹 訓練參數
# -------------------------
PRETRAINED_MODEL_NAME = "stabilityai/stable-diffusion-2-1"
DATASET_PATH = "C:\\AI\\datasets\\zenmaru_dataset\\instance_images"
OUTPUT_DIR = "C:/AI/output/zenmaru_lora/"
RESOLUTION = 512
BATCH_SIZE = 1
EPOCHS = 1
LEARNING_RATE = 1e-4
LORA_RANK = 8  # LoRA 低秩維度
USE_FP16 = True  # 使用半精度訓練

# -------------------------
# 🔹 檢查 GPU 可用性
# -------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"✅ 使用裝置: {device}")

# 選擇一個統一的 dtype (float32 或 float16)
dtype = torch.float16
if not USE_FP16:
    dtype = torch.float32  # 或 torch.float16

# -------------------------
# 🔹 加載預訓練模型
# -------------------------
pipe = StableDiffusionPipeline.from_pretrained(PRETRAINED_MODEL_NAME, torch_dtype=dtype).to(device)
unet = pipe.unet
text_encoder = pipe.text_encoder
tokenizer = pipe.tokenizer

# -------------------------
# 🔹 構建 LoRA 配置
# -------------------------
lora_config = LoraConfig(
    r=LORA_RANK, lora_alpha=16, target_modules=["to_q", "to_k", "to_v"], lora_dropout=0.1, bias="none"
)
unet = get_peft_model(unet, lora_config)
unet.to(dtype)  # 確保 UNet 的 dtype 統一
unet.print_trainable_parameters()  # 查看可訓練參數

# -------------------------
# 🔹 加載數據集
# -------------------------
def load_images(data_path, resolution=512):
    image_files = [os.path.join(data_path, f) for f in os.listdir(data_path) if f.endswith(('.png', '.jpg'))]
    preprocess = transforms.Compose([
        transforms.Resize((resolution, resolution)),
        transforms.ToTensor(),
        transforms.Lambda(lambda img: img if img.shape[0] == 4 else torch.cat([img, torch.ones(1, img.shape[1], img.shape[2])], dim=0)),  # 增加 alpha 通道
        transforms.Normalize([0.5], [0.5])
    ])
    
    images = [preprocess(Image.open(f).convert("RGB")) for f in image_files]
    return torch.stack(images)

train_images = load_images(DATASET_PATH).to(device)
print(f"✅ 加載 {train_images.shape[0]} 張訓練圖片")

# -------------------------
# 🔹 設置優化器
# -------------------------
optimizer = AdamW(unet.parameters(), lr=LEARNING_RATE)

# -------------------------
# 🔹 訓練迴圈
# -------------------------

print("🚀 開始訓練 LoRA...")
for epoch in range(EPOCHS):
    loop = tqdm(range(0, len(train_images), BATCH_SIZE), desc=f"Epoch {epoch+1}/{EPOCHS}")
    for i in loop:
        batch = train_images[i:i + BATCH_SIZE].to(device).to(dtype)
        
        # 生成文本嵌入
        text_inputs = tokenizer(["Zen Maru Gothic Style"] * BATCH_SIZE, padding="max_length", max_length=77, return_tensors="pt").to(device)
        text_embeddings = text_encoder(text_inputs.input_ids)[0].to(device).to(dtype)

        # 產生時間步驟 (確保為 long 型)
        timesteps = torch.randint(
            0, noise_scheduler.config.num_train_timesteps, (BATCH_SIZE,), device=device
        ).long()

        # 加入隨機噪聲
        noise = torch.randn_like(batch).to(device).to(dtype)
        noisy_images = noise_scheduler.add_noise(batch, noise, timesteps)

        # 計算 UNet 輸出
        output = unet(
            noisy_images.to(dtype), 
            timesteps, 
            encoder_hidden_states=text_embeddings
        ).sample.to(dtype)


        # 計算 loss
        loss = nn.functional.mse_loss(output, noise)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        loop.set_postfix(loss=loss.item())



# -------------------------
# 🔹 儲存模型
# -------------------------
output_path = os.path.join(OUTPUT_DIR, "zenmaru_lora.pth")
torch.save(unet.state_dict(), output_path)
print(f"✅ 訓練完成，模型已儲存至 {output_path}")

📌 使用方式

1️⃣ 安裝必要套件

執行以下指令安裝 diffusers、peft 和其他依賴：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # 適用 CUDA 11.8
pip install diffusers transformers peft accelerate datasets tqdm

2️⃣ 準備資料

請確保你的 Zen Maru Gothic 訓練圖片 位於：

C:\AI\datasets\zenmaru_dataset\

並包含 PNG 或 JPG 圖片。

3️⃣ 執行訓練

運行：

python train_lora.py

（請將腳本另存為 train_lora.py）

📌 訓練結果

訓練完成後，會在 C:\AI\output\zenmaru_lora\ 目錄下生成 zenmaru_lora.pth 權重檔。

如何使用？

Stable Diffusion WebUI
- 將 zenmaru_lora.pth 放入： stable-diffusion-webui/models/Lora/
- 在 WebUI 提示詞輸入： <lora:zenmaru_lora:1>
- 這樣就能使用 Zen Maru Gothic 風格字體了！
Diffusers 你也可以在 diffusers 加載 LoRA： from peft import PeftModel unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1") unet = PeftModel.from_pretrained(unet, "C:/AI/output/zenmaru_lora/zenmaru_lora.pth")

📌 常見問題

1️⃣ CUDA 版本不匹配 如果 torch 版本與 CUDA 不相容，請安裝正確版本：

pip uninstall torch torchvision torchaudio
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118

2️⃣ 缺少 peft 如果出現 ModuleNotFoundError: No module named 'peft'，請安裝：

pip install peft

3️⃣ 如何加快訓練？

增加 --train_batch_size
使用 --use_8bit_adam
減少 --max_train_steps

📌 總結

✅ 基於 diffusers + peft 訓練 LoRA
✅ 適用於 Stable Diffusion 2.1
✅ 產生 Zen Maru Gothic 風格的字體模型
✅ 可直接用於 WebUI 或 diffusers

Max的程式語言筆記