將 Discriminator 的 model.9（BatchNorm2d 層）從 256 通道改成 512 通道，這導致 state_dict 內的權重無法匹配新模型的結構

2025-02-242025-02-24

更改 final_channels 希望可以進一步提升字型細節, 模型 Discriminator 架構:

(
(model): Sequential(
(0): Conv2d(2, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), bias=False)
(3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), bias=False)
(6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
(9): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
)
(binary): Linear(in_features=262144, out_features=1, bias=True)
(catagory): Linear(in_features=262144, out_features=40, bias=True)
)

執行程式碼:

checkpoint.pop("model.9.weight")
new_netD.load_state_dict(checkpoint, strict=False)

錯誤訊息:

Error(s) in loading state_dict for Discriminator:
size mismatch for model.8.weight: copying a param with shape torch.Size([256, 256, 5, 5]) from checkpoint, the shape in current model is torch.Size([512, 256, 5, 5]).
size mismatch for model.9.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for model.9.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for model.9.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for binary.weight: copying a param with shape torch.Size([1, 262144]) from checkpoint, the shape in current model is torch.Size([1, 524288]).
size mismatch for catagory.weight: copying a param with shape torch.Size([40, 262144]) from checkpoint, the shape in current model is torch.Size([40, 524288]).

你的錯誤來自於 模型架構變更後的權重尺寸不匹配，所以 state_dict 不能直接載入。
根據錯誤訊息，你將 Discriminator 的 model.9（BatchNorm2d 層）從 256 通道改成 512 通道，這導致 state_dict 內的權重無法匹配新模型的結構。

📌 解法

✅ 方法 1: 忽略不匹配的權重 (`strict=False`)

你可以在 load_state_dict() 時設置 strict=False，這樣 PyTorch 會跳過不匹配的層：

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 讀取模型
net_D = Discriminator()  # 你的新模型

# 讀取舊的 state_dict
state_dict = checkpoint

# 允許不匹配的權重
net_D.load_state_dict(state_dict, strict=False)

print("✅ 成功載入權重，已跳過不匹配的層！")

🔹 適用情況：如果你只是不希望 state_dict 出錯，但仍然希望保留大部分匹配的權重。
🔹 問題點：被跳過的層會用隨機初始化的權重，可能影響模型性能。

✅ 方法 2: 手動刪除不匹配的層

如果你只想載入部分匹配的權重，可以刪除不匹配的層：

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 刪除不匹配的層 (這些層在新的模型有不同的大小)
del checkpoint["model.9.weight"]
del checkpoint["model.9.bias"]
del checkpoint["model.9.running_mean"]
del checkpoint["model.9.running_var"]
del checkpoint["binary.weight"]
del checkpoint["binary.bias"]
del checkpoint["catagory.weight"]
del checkpoint["catagory.bias"]

# 讀取模型
net_D = Discriminator()

# 載入剩下的 state_dict
net_D.load_state_dict(checkpoint, strict=False)

print("✅ 成功載入，已刪除不匹配的層！")

🔹 適用情況：當你知道哪些層發生變化，不想讓 PyTorch 自動決定時。
🔹 問題點：這些被刪除的層會重新初始化，可能需要重新訓練。

✅ 方法 3: 重新計算 `in_features`

你的錯誤訊息還提到 binary 和 catagory 層的 in_features 不匹配：

size mismatch for binary.weight: copying a param with shape torch.Size([1, 262144]) from checkpoint, the shape in current model is torch.Size([1, 524288]).
size mismatch for catagory.weight: copying a param with shape torch.Size([40, 262144]) from checkpoint, the shape in current model is torch.Size([40, 524288]).

這代表 binary 和 catagory 層的 in_features 變成了 原本的兩倍 (262144 → 524288)，這可能是因為圖片解析度變大。

解決方案：

計算新的 in_features new_in_features = 524288 # 根據錯誤訊息
修改 Discriminator 類別，使 binary 和 catagory 重新初始化 net_D.binary = nn.Linear(new_in_features, 1) net_D.catagory = nn.Linear(new_in_features, 40)
重新載入權重 net_D.load_state_dict(checkpoint, strict=False)

這樣你就可以載入舊的 state_dict，同時讓 binary 和 catagory 層適應新的 in_features。

📌 最佳方案

如果你想 最大化保留舊模型權重，同時修正不匹配的部分，可以這樣做：

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 刪除不匹配的層
for key in ["model.9.weight", "model.9.bias", "model.9.running_mean", "model.9.running_var",
            "binary.weight", "binary.bias", "catagory.weight", "catagory.bias"]:
    if key in checkpoint:
        del checkpoint[key]

# 讀取模型
net_D = Discriminator()

# 重新初始化最後的全連接層
new_in_features = 524288  # 根據新的網路架構
net_D.binary = nn.Linear(new_in_features, 1)
net_D.catagory = nn.Linear(new_in_features, 40)

# 載入權重
net_D.load_state_dict(checkpoint, strict=False)

print("✅ 重新載入成功！已修正不匹配的層！")

這樣做可以：

保留舊的卷積層權重 (避免從頭訓練)。
刪除不匹配的 BatchNorm 和全連接層 (讓 PyTorch 自動初始化它們)。
重新定義 binary 和 catagory，以適應新的輸入大小。

新的錯誤訊息:

new_netD.load_state_dict(checkpoint, strict=False)
File "\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 2584, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for Discriminator:
size mismatch for model.8.weight: copying a param with shape torch.Size([256, 256, 5, 5]) from checkpoint, the shape in current model is torch.Size([512, 256, 5, 5]).

這個錯誤表示 模型的第 8 層 (model.8.weight) 的權重形狀不匹配：

舊模型 (checkpoint) 的 model.8.weight 是 [256, 256, 5, 5]
新模型 (new_netD) 的 model.8.weight 變成 [512, 256, 5, 5]

這通常是因為 修改了 ndf 或 final_channels，導致某些卷積層的 out_channels 增加了一倍 (256 → 512)。

📌 解法

✅ 方法 1: 移除 `model.8.weight` 並讓其重新初始化

因為 model.8.weight 在新模型中的形狀不同，我們可以 刪除這個層的權重，讓 PyTorch 用 隨機初始化：

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 刪除不匹配的層
for key in ["model.8.weight", "model.8.bias", "model.8.running_mean", "model.8.running_var"]:
    if key in checkpoint:
        del checkpoint[key]

# 讀取新模型
new_netD = Discriminator()  

# 載入權重 (允許某些層被初始化)
new_netD.load_state_dict(checkpoint, strict=False)

print("✅ 已載入權重，並跳過 model.8 層的錯誤！")

🔹 效果：這樣 model.8 層的權重會被隨機初始化，其他部分仍然能夠載入舊權重。
🔹 適用情況：當你知道哪些層的結構變了，希望保留大部分舊的 state_dict。

✅ 方法 2: 讓 `model.8` 層自適應新權重

如果你不想 完全拋棄 model.8 層的權重，可以嘗試讓舊的權重 適應新的 512 通道：

import torch.nn.functional as F

# 讀取 checkpoint
checkpoint = torch.load("net_D.pth", map_location="cpu")

# 調整 `model.8.weight` 的形狀
old_weight = checkpoint["model.8.weight"]  # 原來的形狀 [256, 256, 5, 5]

# 創建新的 `512` 通道的權重
new_weight = torch.cat([old_weight, old_weight.clone()], dim=0)  # [512, 256, 5, 5]

# 更新權重
checkpoint["model.8.weight"] = new_weight

# 讀取新模型
new_netD = Discriminator()

# 載入修改後的 checkpoint
new_netD.load_state_dict(checkpoint, strict=False)

print("✅ 已自適應 model.8 層的權重！")

🔹 效果：讓舊權重擴展成新的大小 (256 → 512)，保持一定的初始化資訊。
🔹 適用情況：當你 希望最大限度地保留舊權重，並且 model.8 只是 out_channels 變大，而 kernel_size 沒變。

✅ 方法 3: 直接訓練新的 `model.8` 層

如果 model.8 是新加入的層，那麼最好的方式是 直接重新訓練：

載入舊模型（不載入 model.8）
重新初始化 model.8
繼續訓練

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 刪除 model.8 層的舊權重
del checkpoint["model.8.weight"]
del checkpoint["model.8.bias"]
del checkpoint["model.8.running_mean"]
del checkpoint["model.8.running_var"]

# 讀取新模型
new_netD = Discriminator()

# 載入舊權重，允許 model.8 被初始化
new_netD.load_state_dict(checkpoint, strict=False)

# 訓練新模型
train(new_netD)  # 你自己的訓練函數

🔹 效果：這樣 model.8 會 從頭訓練，但其他層仍然保持舊的學習權重。
🔹 適用情況：當 model.8 在新模型中是 全新層 或 大幅改變結構。

📌 最佳方案

如果你希望 保留最多的舊權重，同時 修正不匹配的 model.8 層，最好的方法是：

checkpoint = torch.load("net_D.pth", map_location="cpu")

# 移除不匹配的層
for key in ["model.8.weight", "model.8.bias", "model.8.running_mean", "model.8.running_var"]:
    if key in checkpoint:
        del checkpoint[key]

# 讀取新模型
new_netD = Discriminator()

# 載入剩餘權重
new_netD.load_state_dict(checkpoint, strict=False)

print("✅ 重新載入成功！已修正 model.8 層！")

這樣：

只刪除 model.8（它會被隨機初始化）
其他層的 權重仍然有效
確保模型可以載入並 繼續訓練

Max的程式語言筆記

將 Discriminator 的 model.9（BatchNorm2d 層）從 256 通道改成 512 通道，這導致 state_dict 內的權重無法匹配新模型的結構

📌 解法

✅ 方法 1: 忽略不匹配的權重 (`strict=False`)

✅ 方法 2: 手動刪除不匹配的層

✅ 方法 3: 重新計算 `in_features`

📌 最佳方案

📌 解法

✅ 方法 1: 移除 `model.8.weight` 並讓其重新初始化

✅ 方法 2: 讓 `model.8` 層自適應新權重

✅ 方法 3: 直接訓練新的 `model.8` 層

📌 最佳方案

發佈留言取消回覆

Related Posts

發佈留言 取消回覆

發佈留言取消回覆