ResSkip 殘差模組比較

2025-04-192025-04-19

版本1:

class ResSkip(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.depthwise = nn.Conv2d(in_channels, in_channels, 3, padding=1, groups=in_channels)
        self.pointwise = nn.Conv2d(in_channels, out_channels, 1)
        self.norm = nn.InstanceNorm2d(out_channels)
        self.act = nn.SiLU()
        self.skip = nn.Conv2d(in_channels, out_channels, 1) if in_channels != out_channels else nn.Identity()

    def forward(self, x):
        identity = self.skip(x)
        out = self.depthwise(x)
        out = self.pointwise(out)
        out = self.act(self.norm(out))
        return out + identity

版本2:

class ResSkip(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(ResSkip, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, 1, 1),
            nn.SiLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, 3, 1, 1)
        )

    def forward(self, x):
        return x + self.block(x)

很棒的問題！這兩個 ResSkip 版本都屬於「殘差模組」，但設計原理與用途略有不同，適合的場景也會有所區別。下面是兩者的詳細比較與分析，包括 差異點、可能的影響、使用情境建議：

🧱 結構差異總覽

項目	版本1：Depthwise + Pointwise	版本2：經典 2 層卷積殘差
卷積方式	Depthwise + Pointwise（類似 MobileNet）	兩層普通 Conv2d
是否包含 Norm/Act	✅ 有 InstanceNorm2d + SiLU	✅ 有 SiLU（但沒有 Norm）
通道數轉換	✅ 有處理 in/out 不同的 skip 連接	❌ 假設 in_channels == out_channels
引入參數	較少（depthwise 卷積為輕量）	較多（兩層 full conv）
性能/速度	較快、參數少	較慢、參數多
穩定性	穩定性好，但表達能力較弱	更強表達能力但稍重
是否有正則化	✅ 使用 InstanceNorm2d	❌ 沒有使用 norm

🎯 詳細比較分析

✅ 版本 1：MobileNet 風格 ResBlock（Depthwise + Pointwise）

# 特點：深度可分離卷積 + Pointwise + Norm + Act + Skip

優點：

非常輕量，計算成本低，適合部署或模型剪枝場景。
使用 InstanceNorm，幫助穩定特徵分布。
支援 in/out channel 不一致的情況（有 skip 轉接）。

缺點：

表達能力較弱（參數少、擴展性低）。
若特徵圖非常複雜，可能學不到足夠的非線性變化。

適用情境：

用在輕量化需求中，例如 PixelShuffleUpBlock 中上採樣後的穩定轉換。
或風格遷移中作為轉換平滑器而非重特徵提取者。

✅ 版本 2：標準 ResBlock（兩層 Conv）

# 特點：兩層全連接卷積 + 中間激活函數

優點：

結構簡單但表達力強，能夠處理複雜的特徵變換。
適合在中段層（deep layer）處理抽象語意。
若有需要可以進一步擴展為 bottleneck、加入 norm、dropout。

缺點：

沒有 norm，可能導致特徵不穩（但這其實依訓練流程與資料而異）。
假設 in_channels == out_channels，不具備適應性。

適用情境：

模型中後段的深層特徵抽象層。
用在高表達需求的解碼層，如 UNet 的 decoder。

⚠️ 若將版本1 用在 PixelShuffleUpBlock，有沒有問題？

答案：沒問題，而且很適合！甚至比版本2更穩定，理由如下：

PixelShuffle 本身已經大幅改變了空間維度和特徵位置。
使用 version1 的 depthwise + pointwise + norm 能平滑過渡特徵圖，避免因上採樣導致爆炸或崩潰。
而且 version1 還支援 in/out 通道不同，這點對 PixelShuffle 非常重要（如：in = 256, out = 64）。

✅ 建議策略（根據應用）

使用位置	建議使用版本
PixelShuffleUpBlock 後	✅ 版本1（更穩定，參數少）
UNet Encoder 中層	✅ 版本2（表達力強）
深層 Transformer/Attention 後	✅ 版本1 or 加入 norm 的版本2

🔁 最佳化方向（綜合版本）

你也可以寫一個更通用的 ResBlock 支援兩種模式：

class ResSkip(nn.Module):
    def __init__(self, in_channels, out_channels, mode='depthwise', norm=True):
        super().__init__()
        self.skip = nn.Conv2d(in_channels, out_channels, 1) if in_channels != out_channels else nn.Identity()

        if mode == 'depthwise':
            self.body = nn.Sequential(
                nn.Conv2d(in_channels, in_channels, 3, padding=1, groups=in_channels),
                nn.Conv2d(in_channels, out_channels, 1),
                nn.InstanceNorm2d(out_channels) if norm else nn.Identity(),
                nn.SiLU()
            )
        elif mode == 'standard':
            self.body = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 3, 1, 1),
                nn.InstanceNorm2d(out_channels) if norm else nn.Identity(),
                nn.SiLU(),
                nn.Conv2d(out_channels, out_channels, 3, 1, 1)
            )
        else:
            raise ValueError("Unknown mode")

    def forward(self, x):
        return self.body(x) + self.skip(x)

如果你希望我幫你直接整合這版到你的 PixelShuffleUpBlock 或整個 UNet 架構，我也可以幫你接上。要嗎？

Max的程式語言筆記