只在最外層使用 pixelshuffle

by max-stackoverflow | Posted on 2025-04-182025-04-18

你提供的 UnetSkipConnectionBlock 與 UNetGenerator 程式碼基本上是正確的，現在你想調整成 hybrid 模式：

「僅最外層使用 PixelShuffle，其餘使用 conv 或 upsample。」

我們可以在 UNetGenerator 建構子中控制 up_mode，讓每層的 up_mode 不同，具體實作如下：

✅ 修改方式：

將原本的單一 up_mode，改為 每層分配對應的 upsample 模式：

class UNetGenerator(nn.Module):
    def __init__(self, input_nc=1, output_nc=1, num_downs=8, ngf=32,
                 embedding_num=40, embedding_dim=64,
                 norm_layer=nn.InstanceNorm2d, use_dropout=False,
                 self_attention=False, blur=False, attention_type='linear',
                 attn_layers=None, up_mode='conv', freeze_downsample=False):
        super(UNetGenerator, self).__init__()
        if attn_layers is None:
            attn_layers = []

        # 自訂各層的上採樣方式（最外層使用 pixelshuffle，其餘使用 conv 或 upsample）
        up_modes = ['conv'] * (num_downs - 1) + ['pixelshuffle']

        # 最內層
        unet_block = UnetSkipConnectionBlock(
            ngf * 8, ngf * 8, input_nc=None, submodule=None,
            norm_layer=norm_layer, layer=1, embedding_dim=embedding_dim,
            self_attention=self_attention, blur=blur, innermost=True,
            use_transformer=True, attention_type=attention_type,
            attn_layers=attn_layers, up_mode=up_modes[0], freeze_downsample=freeze_downsample
        )

        # 中間層
        for i in range(num_downs - 5):
            unet_block = UnetSkipConnectionBlock(
                ngf * 8, ngf * 8, input_nc=None, submodule=unet_block,
                norm_layer=norm_layer, layer=i + 2, use_dropout=use_dropout,
                self_attention=self_attention, blur=blur, attention_type=attention_type,
                attn_layers=attn_layers, up_mode=up_modes[i + 1], freeze_downsample=freeze_downsample
            )

        # 其他高階層
        unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, submodule=unet_block, norm_layer=norm_layer, layer=5, up_mode=up_modes[-4], freeze_downsample=freeze_downsample)
        unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, submodule=unet_block, norm_layer=norm_layer, layer=6, up_mode=up_modes[-3], freeze_downsample=freeze_downsample)
        unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, submodule=unet_block, norm_layer=norm_layer, layer=7, up_mode=up_modes[-2], freeze_downsample=freeze_downsample)

        # 最外層：使用 pixelshuffle
        self.model = UnetSkipConnectionBlock(
            output_nc, ngf, input_nc=input_nc, submodule=unet_block,
            norm_layer=norm_layer, layer=8, outermost=True,
            self_attention=self_attention, blur=blur,
            attention_type=attention_type, attn_layers=attn_layers, up_mode=up_modes[-1], freeze_downsample=freeze_downsample
        )

        self.embedder = nn.Embedding(embedding_num, embedding_dim)

🔄 結果

這樣設定後：

層級	`up_mode` 用法
Layer 8	`pixelshuffle`（最外層）✅
Layer 1~7	`conv`（或改成你需要的 hybrid conv/upsample）

如果你還想讓某些中層用 upsample，可以手動指定 up_modes 為：

up_modes = ['conv', 'conv', 'upsample', 'upsample', 'conv', 'conv', 'conv', 'pixelshuffle']

需要我也幫你補上讓 UNet 支援 switch_up_mode(layer_id, mode) 的方法來動態切換層的上採樣方式嗎？

這其實取決於你整體模型的設計目標與平衡點，但可以從幾個面向分析「在 innermost 或 outermost 使用 PixelShuffle 哪個比較好」：

✅ 若在 `innermost` 使用 PixelShuffle：

優點：

計算負擔小：PixelShuffle 在 innermost 處特徵圖尺寸最小，使用效率高、計算負擔也最低。
提前改善資訊重建品質：早期就進行精細的上採樣，讓之後的 decoding 層（上層）可以處理更高解析度的資訊。
可搭配 Transformer / FiLM 強化語義表示：你目前架構裡 Transformer + FiLM 都在 innermost，配合 PixelShuffle 提早放大資訊，也許更有助於語義展開。

缺點：

後續特徵圖 channel 數仍高，模型整體記憶體佔用可能沒大幅減少（因為還會 concat 跟 encode）。

✅ 若在 `outermost` 使用 PixelShuffle：

優點：

最終輸出品質更佳：PixelShuffle 可以減少 checkerboard artifact，比 transposed conv 更平滑。
避免過早解碼資訊：留到最後再進行高質量上採樣，讓前面專注於特徵抽取、匹配，最後階段才提升畫面品質。
視覺輸出友好：應用於生成字體這類細節明確任務時，最後階段保持高品質上採樣會更穩定。

缺點：

記憶體消耗大：最後一層通常 feature map 較大，PixelShuffle 會多占記憶體。
延遲解碼資訊可能有風險：若模型前面無法處理足夠的細節，最後再放大容易模糊。

🔧 建議策略：hybrid 模式（你現在的方向）是非常合理的

在 innermost 使用 PixelShuffle：平衡計算與 early decoding 精細度。
在其他層維持 conv or upsample：減少記憶體消耗。
在 outermost 加上 smoothing conv 可緩解 checkerboard artifact。

✅ 總結建議：

部位	適合使用 PixelShuffle 嗎？	原因
innermost	✅ 推薦	計算輕、資訊早期解碼、搭 FiLM / Transformer 效果好
outermost	✅ 條件使用	若追求視覺品質，可在最後加上 PixelShuffle + smoothing conv

如果你要做高品質中文字風格轉換，可以在 outermost 做 PixelShuffle，但若顧慮記憶體與效率，維持 innermost 就很理想。

需要的話我可以幫你調整目前的 UNetGenerator，變成 innermost 使用 PixelShuffle，outer 層用 conv/upsample。你也可以在 up_mode='hybrid' 形式中自動選擇 innermost 使用 PixelShuffle。需要嗎？

太好了，我來幫你把 UnetSkipConnectionBlock 和 UNetGenerator 的邏輯調整成 hybrid 模式，讓：

innermost 使用 PixelShuffle
其他層保持 conv（或你之後可選擇性用 upsample）

✅ 調整重點

up_mode 新增選項 'hybrid'。
在 UnetSkipConnectionBlock.__init__() 裡根據 innermost 與 up_mode=='hybrid' 自動選擇 pixelshuffle。
其餘層 fallback 回指定的 conv 或 upsample。

✅ 修改後的關鍵片段（只列出變動重點）

📌 在 `UnetSkipConnectionBlock` 裡，修改這行：

self.up_mode = 'pixelshuffle' if up_mode == 'hybrid' and innermost else up_mode

建議放在 constructor 一開始、self.up_mode = up_mode 之後：

self.up_mode = up_mode
if up_mode == 'hybrid' and innermost:
    self.up_mode = 'pixelshuffle'

這樣可保證在 innermost 的 block 裡，自動切換為 PixelShuffle 模式，其他照常。

✅ `UNetGenerator` 不需變，只要傳入 `'hybrid'`：

netG = UNetGenerator(
    input_nc=1, output_nc=1, num_downs=8,
    up_mode='hybrid',  # << 新增這個參數即可觸發 hybrid 行為
)

✅ 可選：讓 `UnetSkipConnectionBlock` 印出目前使用哪種 up_mode（除錯用）

在 __init__() 加一行：

print(f"[Layer {layer}] Using up_mode = {self.up_mode} (innermost={innermost})")

這樣你跑一次 model 初始化就知道哪一層用了 PixelShuffle。

✅ 結果

Layer	innermost	up_mode 傳入	實際使用 up_mode
Layer 1	✅	hybrid	pixelshuffle
Layer 2~7	❌	hybrid	conv（或你設定）
Layer 8（out）	❌	hybrid	conv（或你設定）

需要我幫你 patch 整份完整碼、或幫你加一個 .switch_up_mode() 自動切換所有 upsampling 層的函式也可以喔！

好的！以下是完整 hybrid 模式支援 + 可動態切換 up_mode 的版本，已經包含以下功能：

✅ 你需要的功能

✅ innermost 自動使用 PixelShuffle（如果 up_mode=’hybrid’）
✅ 其餘層維持原本的 'conv' 或 'upsample'
✅ 加入 switch_upsample_mode(new_mode) 方法 → 可隨時切換整個 UNet 的上採樣模式（含 hybrid）
✅ freeze_encoder() 可選（若你還想要）

🔧 修改重點（直接可用）

✅ 修改 `UnetSkipConnectionBlock`：

class UnetSkipConnectionBlock(nn.Module):
    def __init__(self, outer_nc, inner_nc, input_nc=None, submodule=None,
                 norm_layer=nn.InstanceNorm2d, layer=0, embedding_dim=64,
                 use_dropout=False, self_attention=False, attention_type='linear',
                 blur=False, outermost=False, innermost=False, use_transformer=False,
                 attn_layers=None, up_mode='conv', freeze_downsample=False):
        super(UnetSkipConnectionBlock, self).__init__()
        self.outermost = outermost
        self.innermost = innermost
        self.layer = layer
        self.attn_layers = attn_layers or []
        self.original_up_mode = up_mode  # << 原始設定
        self.up_mode = 'pixelshuffle' if up_mode == 'hybrid' and innermost else up_mode
        self.freeze_downsample = freeze_downsample
        ...
        # 其餘程式碼保持原樣

✅ 修改 `UNetGenerator`，加入 `switch_upsample_mode()`：

class UNetGenerator(nn.Module):
    def __init__(self, ...):  # 略
        ...
        self.model = UnetSkipConnectionBlock(
            output_nc, ngf, input_nc=input_nc, submodule=unet_block,
            norm_layer=norm_layer, layer=8, outermost=True,
            self_attention=self_attention, blur=blur,
            attention_type=attention_type, attn_layers=attn_layers, up_mode=up_mode,
            freeze_downsample=freeze_downsample
        )
        self.embedder = nn.Embedding(embedding_num, embedding_dim)
        self.up_mode = up_mode  # ← 儲存原始模式

    ...

    def switch_upsample_mode(self, new_mode: str):
        def switch_block_mode(block):
            if hasattr(block, 'submodule') and block.submodule is not None:
                switch_block_mode(block.submodule)
            if hasattr(block, 'original_up_mode'):
                block.up_mode = 'pixelshuffle' if new_mode == 'hybrid' and block.innermost else new_mode
                print(f"[Layer {block.layer}] Switched to up_mode = {block.up_mode}")

        self.up_mode = new_mode
        switch_block_mode(self.model)

✅ 使用方式：

netG = UNetGenerator(up_mode='hybrid')  # innermost → pixelshuffle

netG.switch_upsample_mode('conv')       # 切換為 conv 上採樣
netG.switch_upsample_mode('pixelshuffle')  # 所有層都用 pixelshuffle
netG.switch_upsample_mode('hybrid')     # 只有 innermost 用 pixelshuffle

✅ 若你也要凍結編碼器：

你已經有 freeze_downsample 參數，我可以幫你額外加個 netG.freeze_encoder() 來遞迴凍結 encoder 那邊的 block（不含 decoder 和 style）。

要加這功能嗎？還是你已經手動加好了？

你的觀察非常精準 👏。Checkerboard artifact 是典型的 ConvTranspose2d 問題，尤其是多層堆疊時更明顯。單獨在 innermost 層換成 PixelShuffle 確實可能「不夠」，因為 checkerboard 通常是從 decoder 早期（靠近 bottleneck）開始累積放大。

✅ 建議策略：後幾層 decoder 用 PixelShuffle，前幾層保留 Conv

這是更穩妥的 hybrid 設計，會達成這樣的編碼器/解碼器策略：

Downsampling（encoder）:
  Conv2d -> Conv2d -> ...

Upsampling（decoder）:
  PixelShuffle (最深層)
  PixelShuffle (次深層)
  ConvTranspose2d (淺層)
  ConvTranspose2d (最淺層)

🔧 改法：針對 layer 深度，自動決定 up_mode

✅ 修改 `UnetSkipConnectionBlock` 判斷方式（依照 `layer` 層數決定是否使用 PixelShuffle）：

# 原本這段判斷：
self.up_mode = 'pixelshuffle' if up_mode == 'hybrid' and innermost else up_mode

# 改為這樣：
if up_mode == 'hybrid':
    # 使用 PixelShuffle 的層數範圍（建議最深兩層）
    self.up_mode = 'pixelshuffle' if self.layer in [1, 2] else 'conv'
else:
    self.up_mode = up_mode

💡 你可以自訂調整使用 PixelShuffle 的層：

如果你覺得只用在 layer 1, 2 不夠，也可以擴充成 [1, 2, 3] 或 [1, 2, 3, 4]，這樣會讓更多 decoder 區塊使用 PixelShuffle，減少 checkerboard。

🔁 bonus：建議搭配 smoothing conv

你已經在 pixelshuffle 的後面加了：

nn.Conv2d(outer_nc, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias)

這個是正確的！可以進一步平滑 pixelshuffle 的 output 避免 blocky artifact，請保留這層 smoothing conv。

📌 總結

層	up_mode（建議）	原因
Layer 1	`pixelshuffle`	bottleneck 區避免 checkerboard
Layer 2	`pixelshuffle`	bottleneck 附近效果疊加
Layer 3-5	`conv` or `upsample`	可以保留原本結構，有助收斂與解析度
Layer 6+（最淺）	`conv`	checkerboard 影響較小，可用原本方式

✅ 需要我幫你把這個 layer-based hybrid 策略整合進你的程式碼嗎？

我可以幫你快速 patch 一版 UnetSkipConnectionBlock 與 UNetGenerator，自動依照 layer 切換 up_mode！只要說一聲 👍

好的！以下是**支援 hybrid 上採樣模式（根據 layer 決定是否使用 PixelShuffle）**的版本，已整合進你的 UnetSkipConnectionBlock 與 UNetGenerator：

✅ 重點變更

新增 up_mode == "hybrid" 支援。
若是 hybrid，則第 1 層（innermost）和第 2 層會使用 pixelshuffle，其餘仍使用 conv。
無需修改其他地方，只需在建立模型時 up_mode='hybrid' 即可。

🔧 修改後的 `UnetSkipConnectionBlock.init()`：

# 判斷使用哪種上採樣模式（支援 hybrid 模式）
if up_mode == 'hybrid':
    # 根據層數決定是否使用 pixelshuffle
    self.up_mode = 'pixelshuffle' if layer in [1, 2] else 'conv'
else:
    self.up_mode = up_mode

🧩 整合範例（只需改這一段就好）：

請將你原本的 UnetSkipConnectionBlock.__init__() 中這行：

self.up_mode = up_mode

替換成以下程式碼：

if up_mode == 'hybrid':
    # 根據層數決定是否使用 pixelshuffle（可依需求修改 layer 範圍）
    self.up_mode = 'pixelshuffle' if layer in [1, 2] else 'conv'
else:
    self.up_mode = up_mode

✅ 如何啟用 hybrid 模式？

只要在你初始化 UNetGenerator 時，設定：

generator = UNetGenerator(..., up_mode='hybrid', ...)

即可啟用這種自動切換的 hybrid 模式。

🔍 想進一步調整哪些層用 pixelshuffle？

只要改這一行的 list：

self.up_mode = 'pixelshuffle' if layer in [1, 2] else 'conv'

你可以自由調整成 [1, 2, 3] 或 [1, 2, 3, 4] 等，來控制 pixelshuffle 的深度。

需要我幫你 patch 好一整版完整 UnetSkipConnectionBlock 程式碼貼上來？還是你會自己改這幾行就可以？

當然可以，以下是整合好 hybrid 模式的完整 UnetSkipConnectionBlock 程式碼 ✅

這個版本支援三種 up_mode：'conv'、'upsample'、'pixelshuffle'，以及 新的 'hybrid' 模式（在指定層使用 pixelshuffle，其餘使用 conv）：

🔧 `UnetSkipConnectionBlock`（含 hybrid 模式）完整程式碼：

class UnetSkipConnectionBlock(nn.Module):
    def __init__(self, outer_nc, inner_nc, input_nc=None, submodule=None,
                 norm_layer=nn.InstanceNorm2d, layer=0, embedding_dim=64,
                 use_dropout=False, self_attention=False, attention_type='linear',
                 blur=False, outermost=False, innermost=False, use_transformer=False,
                 attn_layers=None, up_mode='conv', freeze_downsample=False):
        super(UnetSkipConnectionBlock, self).__init__()
        self.outermost = outermost
        self.innermost = innermost
        self.layer = layer
        self.attn_layers = attn_layers or []

        # === hybrid 模式處理 ===
        if up_mode == 'hybrid':
            self.up_mode = 'pixelshuffle' if layer in [1, 2] else 'conv'
        else:
            self.up_mode = up_mode

        self.freeze_downsample = freeze_downsample

        use_bias = norm_layer != nn.BatchNorm2d
        if input_nc is None:
            input_nc = outer_nc

        kernel_size = 3 if innermost else 4
        stride = 1 if innermost else 2
        padding = 1

        downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=kernel_size, stride=stride, padding=padding, bias=use_bias)
        nn.init.kaiming_normal_(downconv.weight, nonlinearity='leaky_relu')

        downrelu = nn.SiLU(inplace=True)
        downnorm = norm_layer(inner_nc)
        uprelu = nn.SiLU(inplace=True)
        upnorm = norm_layer(outer_nc)

        # === outermost 層 ===
        if outermost:
            if self.up_mode == 'conv':
                upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1, output_padding=1, bias=use_bias)
                nn.init.kaiming_normal_(upconv.weight)
            elif self.up_mode == 'upsample':
                upconv = nn.Sequential(
                    nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),
                    nn.Conv2d(inner_nc * 2, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[1].weight)
            elif self.up_mode == 'pixelshuffle':
                upconv = nn.Sequential(
                    nn.Conv2d(inner_nc * 2, outer_nc * 4, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    nn.PixelShuffle(2),
                    nn.Conv2d(outer_nc, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[0].weight)
                nn.init.kaiming_normal_(upconv[2].weight)
            else:
                raise ValueError(f"Unsupported up_mode: {self.up_mode}")
            self.down = nn.Sequential(downconv)
            self.up = nn.Sequential(uprelu, upconv, nn.Tanh())

        # === innermost 層 ===
        elif innermost:
            if self.up_mode == 'conv':
                upconv = nn.ConvTranspose2d(inner_nc, outer_nc, kernel_size=4, stride=2, padding=1, output_padding=1, bias=use_bias)
                nn.init.kaiming_normal_(upconv.weight)
            elif self.up_mode == 'upsample':
                upconv = nn.Sequential(
                    nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),
                    nn.Conv2d(inner_nc, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[1].weight)
            elif self.up_mode == 'pixelshuffle':
                upconv = nn.Sequential(
                    nn.Conv2d(inner_nc, outer_nc * 4, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    nn.PixelShuffle(2),
                    nn.Conv2d(outer_nc, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[0].weight)
                nn.init.kaiming_normal_(upconv[2].weight)
            else:
                raise ValueError(f"Unsupported up_mode: {self.up_mode}")

            self.down = nn.Sequential(downrelu, downconv)
            self.up = nn.Sequential(uprelu, upconv, upnorm)
            if use_transformer:
                self.transformer_block = TransformerBlock(inner_nc)
            self.film = FiLMModulation(inner_nc, embedding_dim)

        # === 中間層 ===
        else:
            if self.up_mode == 'conv':
                upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1, output_padding=1, bias=use_bias)
                nn.init.kaiming_normal_(upconv.weight)
            elif self.up_mode == 'upsample':
                upconv = nn.Sequential(
                    nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False),
                    nn.Conv2d(inner_nc * 2, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[1].weight)
            elif self.up_mode == 'pixelshuffle':
                upconv = nn.Sequential(
                    nn.Conv2d(inner_nc * 2, outer_nc * 4, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    nn.PixelShuffle(2),
                    nn.Conv2d(outer_nc, outer_nc, kernel_size=3, stride=1, padding=1, bias=use_bias),
                    upnorm
                )
                nn.init.kaiming_normal_(upconv[0].weight)
                nn.init.kaiming_normal_(upconv[2].weight)
            else:
                raise ValueError(f"Unsupported up_mode: {self.up_mode}")

            self.down = nn.Sequential(downrelu, downconv, downnorm)
            self.up = nn.Sequential(uprelu, upconv, upnorm)
            if use_dropout:
                self.up.add_module("dropout", nn.Dropout(0.3))

        self.submodule = submodule

        if self_attention and self.layer in self.attn_layers:
            self.attn_block = LinearAttention(inner_nc) if attention_type == 'linear' else SelfAttention(inner_nc)
        else:
            self.attn_block = None

        self.res_skip = ResSkip(outer_nc, outer_nc) if not outermost and not innermost and layer in [4, 5, 6, 7] else None

        if self.freeze_downsample:
            for param in downconv.parameters():
                param.requires_grad = False
            for param in downnorm.parameters():
                param.requires_grad = False

    def forward(self, x, style=None):
        if hasattr(self, 'attn_block') and self.attn_block is not None:
            x = self.attn_block(x)
        encoded = self.down(x)
        if self.innermost:
            if hasattr(self, 'transformer_block'):
                encoded = self.transformer_block(encoded)
            if hasattr(self, 'film') and style is not None:
                encoded = self.film(encoded, style)
            decoded = self.up(encoded)
            if decoded.shape[2:] != x.shape[2:]:
                decoded = F.interpolate(decoded, size=x.shape[2:], mode='bilinear', align_corners=False)
            if hasattr(self, 'res_skip') and self.res_skip is not None:
                decoded = self.res_skip(decoded)
            return torch.cat([x, decoded], 1), encoded.contiguous().view(x.shape[0], -1)
        else:
            sub_output, encoded_real_A = self.submodule(encoded, style)
            decoded = self.up(sub_output)
            if decoded.shape[2:] != x.shape[2:]:
                decoded = F.interpolate(decoded, size=x.shape[2:], mode='bilinear', align_corners=False)
            if hasattr(self, 'res_skip') and self.res_skip is not None:
                decoded = self.res_skip(decoded)
            if self.outermost:
                return decoded, encoded_real_A
            else:
                return torch.cat([x, decoded], 1), encoded_real_A

✅ 若你需要，我也可以幫你 patch UNetGenerator 讓整體看起來更一致。
或是提供一版「只有最後幾層用 pixelshuffle，其餘用 conv」的版本（例如：layer >= 6 才 pixelshuffle）來對應 checkerboard 修正最佳化策略。需要的話隨時說！

Max的程式語言筆記