CoCosNet: Cross-domain Correspondence Learning for Exemplar-based Image Translation

Pan Zhang1^11, Bo Zhang2^22, Dong Chen2^22, Lu Yuan3^33, Fang Wen2^22 
(1^11UST China, 2^22Microsoft Research Asia 3^33Microsoft Cloud+AI)
[paper][code][project]

Style Image (Examplar) + Condition ⇒ Image Synthesis

•

스타일 이미지 (Examplar)와 다양한 종류의 Conditional Input (Semantic map, pose, edge, layout 등)을 이용하여 새로운 이미지를 생성한다.

•

기존의 SPADE류 방법들은 VAE에 기반한 반면,

•

이 논문은 Input Condition과 Examplar간의 Correlation을 구한 후 modulation parameter를 구하는 StyleGAN 프레임워크로 접근했다는 것이 특징이다.

•

Generator의 옆구리에서 AdaIN 파라메터가 들어가므로 성능이 더 좋을 수 밖에 없다.

Intro & Overview

Examplar와 Input간의 관계를 고려하기 위해, Domain Alignment와 Correlation matrix를 구한다.

이에 기반하여 Examplar를 Condition에 맞도록 warping해준 후,

Warp된 examplar로부터 SPADE처럼 modulation parameter를 구해 Image를 Synthesis한다.

•

이 논문의 Contribution은 Examplar와 Condition간의 Correlation을 구한다는 점이다.

◦

기존 SPADE류의 방법은 Style을 입히기 위해 Style Image를  Encoder로 인코딩하여 parameter mu와 sigma를 구한 후, 이 Source Distribution으로부터 sampling하여 Generator의 Input으로 넣어주었던 것과 달리

◦

두 이미지를 함께 Encoding하고 correlation을 구한 후 warp해준 뒤 StyleGAN처럼 접근했다는 방식이 참신했다.

Methodology

Cross-domain correspondence network

Corresondence within shared domain

Translation network

Losses

Losses for pseudo examplar pairs

\mathcal{L}_{feat}=\sum_L\lambda_l || \phi_l(\mathcal{G}(x_A,x'_B)) -\phi_l(x_B)) ||_1

Domain alignment Loss

\mathcal{L}^{l_1}_{domain}= || \mathcal{F}_{A\rightarrow S}(x_A) -\mathcal{F}_{B\rightarrow S}(x_B) ||_1

•

KL Divergence loss 가 더 좋지 않을까?

Examplar translation losses
3-1. Perceptual Loss

\mathcal{L}_{perc}= || \phi_l(\hat{x}_B) -\phi_l(x_B) ||_1

3-2. Contextual Loss

\mathcal{L}_{context}= \sum_l \omega_l[ -\log( \frac{1}{n_l}\sum_i\max_jA^l( \psi^l_i(\hat{x}_B),\psi^l_j(y_B) )) ]

Correspondence regularization

\mathcal{L}_{reg}=|| r_{y\rightarrow x\rightarrow y}-y_B ||_1

where

r_{y\rightarrow x\rightarrow y}(v)=\sum_u(\alpha\mathcal{M}(u,v))\cdot r_{y\rightarrow x}(u)

: forward-backward warping image.

Adversarial Loss

\mathcal{L}_{adv}^\mathcal{D}=-\mathbb{E}[h(\mathcal{D}(y_B))] -\mathbb{E}[h(-\mathcal{D}(\mathcal{G}(x_A,y_B)))] \\ \mathcal{L}_{adv}^\mathcal{G}= -\mathbb{E}[h(-\mathcal{D}(\mathcal{G}(x_A,y_B)))]

where

h(t)=\min(0,-1+t)

: hinge function to regularize the

\mathcal{D}

•

Total Loss

\mathcal{L}_\theta= \min_{\mathcal{F},\mathcal{T},\mathcal{G}} \max_{\mathcal{D}} \psi_1\mathcal{L}_{feat}+ \psi_2\mathcal{L}_{perc}+ \psi_3\mathcal{L}_{context}+ \psi_4\mathcal{L}_{adv}^{\mathcal{G}}+ \psi_5\mathcal{L}_{domain}^{\mathcal{l}_1}+ \psi_6\mathcal{L}_{reg}

Experimental Results

Datasets

•

ADE20k (Segmentation → indoor image)

◦

코드를 돌리면서 얘때문에 엄청 고생했다. 
ADE20k 공홈에서 받으면 안되고, ADEChallenge2016이라는 데이터셋 (Annotation이 흑백이다.) 을 구해서 돌려야 된다. (근데 논문에는 ADE20k로 들어가 있다..)

•

ADE20k-outdoor (Segmentation → outdoor image)

◦

ADE20k의 subset이다.

•

CelebA-HQ (Edge → face image)

◦

Canny Edge Detector를 사용해 Edge를 추출 후 Edge to Image Task를 수행했다.

•

Deepfasion (Pose → Image)

•

Lsun Bedroom (Segmentation → indoor image)

◦

다른 연구와는 달리 LSUN Bedroom은 쓰이지 않았다.

CoCosNet: Cross-domain Correspondence Learning for Exemplar-based Image Translation

CoCosNet: Cross-domain Correspondence Learning for Exemplar-based Image Translation

Intro & Overview

Methodology

Cross-domain correspondence network

Corresondence within shared domain

Translation network

Losses

Experimental Results

Datasets

Exp 1. Qualitative Comparisons

Exp 2. Quantitative Comparisons

Exp 3. User Study

Exp 4. Cros-domain Correspondence

Exp 5. Ablation Study

Exp 6. Application: Image Editing, Make-up Transfer