⌛

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired I2I Translation

Diffusion Series

Diffusion Theory Summary

Image-to-image Translation

EGSDE: Unpaired I2I Translation via Energy-Guided SDEs.

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation

Intro & Overview

Contribution of this work.

Controllable Generation with Score-based Model

I2I Task에서의 기존 Score-based 모델의 한계

Proposed Method Overview

Prelinamaries: Topology and manifolds

Basic Knowledge about Manifold.

Methodology

Model Overview

Decomposition of the Score and Energy Guidance

Stage 1: Optimization on Manifold.

Stage 2. Transformation between adjacent manifolds.

Chunking Trick

Experimental Results

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation

Shikun Sun, Longhui Wei, Junliang Xing, Jia Jia, Qi Tian (Huawei Cloud)
[paper]

Intro & Overview

Contribution of this work.

Conditional Image Synthesis 중 얽힌 분포를 최적화하기 위해 SDDM을 개발, 이미지의 노이즈 제거 및 콘텐츠 세련 부분으로 점수 함수를 분해 (Decompose) 한다.

SBDM의 조건부 생성에 다목적 최적화 알고리즘을 도입, 복잡한 gradient그라디언트 조합 알고리즘과 점수 인자 조정을 가능하게 하다.

기존 저역 통과 필터보다 차원이 낮은 새로운 Block Adaptive Instance Normalization (BAdaIN) 모듈을 설계한다.

Controllable Generation with Score-based Model

•

[이 포스트] 에 이어서 설명한다.

•

score-based model에서 condtion은 다음과 같다.

\text{d}\mathbf{x}= [\ \mathbf{f}(\mathbf{x},t) - \mathbf{g}^2(t)( \underbrace{\mathbf{s}_\theta(\mathbf{x},t) }_{\substack{ \text{score model} }} + \underbrace{ \red{ \nabla_{\mathbf{x}} \epsilon(\mathbf{x},t)} }_{\substack{ \text{guidance} }} \ ]\text{d}t + g(t)\text{d}\bar{\mathbf{w}}

•

Unpaired I2I를 해결하기 위해, EGSDE (Zhao et al., 2022) 는 두 개의 Energy-based guidance 함수를 설계했다.

\text{d}\mathbf{x}= [\ \mathbf{f}(\mathbf{x},t) - \mathbf{g}^2(t)( \mathbf{s}_\theta(\mathbf{x},t) + \underbrace{ \red{ \nabla_{\mathbf{x}} \epsilon(\mathbf{x},\mathbf{y}_0,t)} }_{\substack{ \text{guidance} }} \ ]\text{d}t + g(t)\text{d}\bar{\mathbf{w}}

하지만, 저자는 energy-based method가 중간 과정의 분포가 지나치게 또는 부정적으로 교란되는 것을 막을수 없었기 때문에, 둘 다 reference image의 statistics를 완전히 활용할 수 없었습니다. 그 결과 suboptimal한 결과가 나온다고 지적한다.

I2I Task에서의 기존 Score-based 모델의 한계

Score guidance coefficient가 고정됨: 역 확산 과정의 확률 미분 방정식(SDE)으로 인해 Score-guidance의 coefficient가 고정되어 있어 모델 조정의 유연성이 제한된다.

Energy Guidance 영향 불명확: 에너지 가이던스가 중간 분포에 미치는 영향이 명확하지 않아서 I2I Task에서 iteration이 충분하지 못할 때 결과가 불만족스러울 수 있다.

중간 분포의 간섭: Guidance 과정 중 중간의 분포가 나쁜 영향의 간섭을 받지 않도록 보장하는 방법이 현재로서는 없다.

Proposed Method Overview

이를 극복하기 위해, 저자는 score function을 새로운 manifold 상에서 정의하고 분해 (decompose) 할 것을 제안합니다. 이를 통해 더욱 나은 에너지 가이던스와 statistical 가이던스를 제공할 수 있다고 주장한다.

Prelinamaries: Topology and manifolds

Basic Knowledge about Manifold.

이 논문은 위상학에 대한 기본 지식을 필요로 한다. 먼저, 위상공간의 정의는 다음과 같다.

집합 X가 주어졌을 때, T∈M(X)\mathcal T \in \mathcal M(X)T∈M(X)가 T∈TT \in \mathcal TT∈T에 대해서 다음 세가지 조건을 만족하면 T\mathcal TT를 XXX의 위상 (Topology) 라고 부르고, (X,T)(X,\mathcal T)(X,T)를 위상공간 (Topological space)라고 부른다.

ϕ,X∈Mϕ,X \in \mathcal Mϕ,X∈M

⋃α∈∀Tα∈M\bigcup_{\alpha\in\forall}T_\alpha \in \mathcal M⋃α∈∀​Tα​∈M

⋂i=1nTi∈M\bigcap^n_{i=1}T_i\in\mathcal M⋂i=1n​Ti​∈M

이를 말로 풀어쓰면 다음과 같다.

M\mathcal MM은 공집합과 전체집합을 포함한다.

M\mathcal MM의 원소의 합집합은 M\mathcal MM에 속한다. 

M\mathcal MM의 원소의 유한 교집합은 M\mathcal MM에 속한다.

Definition 5. Topological space
A topological space M\mathcal MM is locally Euclidean of dimension nnn if every point ppp in M\mathcal MM has a neighborhood UUU such that there is a homeomorphism ϕϕϕ from UUU onto an open subset of Rd\mathbb R^dRd. We call the pair (U,ϕ:U→RnU, ϕ : U → \mathbb R^nU,ϕ:U→Rn ) a chart, UUU a coordinate neighborhood or an open coordinate set, and ϕϕϕ a coordinate map or a coordinate system on UUU. We say that a chart (U,ϕ)(U, ϕ )(U,ϕ) is centered at p∈Up ∈ Up∈U if ϕ(p)=0ϕ (p) = 0ϕ(p)=0. A chart (U,ϕ)(U, ϕ )(U,ϕ) about ppp simply means that (U,ϕ)(U, ϕ )(U,ϕ) is a chart and p∈Up ∈ Up∈U.

정의 5: 위상 공간. 위상 공간에 대한 정의이다.

•

위상 공간 M \mathcal{M}M이 nnn차원의 국소적으로 유클리드 공간이라는 것은 다음을 의미한다

◦

M\mathcal{M}M의 모든 점 ppp가 그 점의 근방 UUU를 가지고, 

◦

이 UUU가 Rd\mathbb{R}^dRd 의 열린 부분집합에 대한 위상동형 (homeomorphism) ϕϕϕ를 가진다. 

▪

여기서 위상동형 (homeomorphism)을 짚고 넘어가자:

위상동형이란, 두 위상공간 X,YX,YX,Y에 대해 전단사 f:X→Yf:X\rightarrow Yf:X→Y가 존재해서 fff 와 그 역함수 f−1f^{-1}f−1 모두 연속함수이면 f를 위상동형사상 (Homeomorphism) 이라고 부르고 두 위상공간이 위상동형 (Homeomorphic) 이라고 한다.

▪

즉, 위상동형은 연속적이고 역전이 가능한 일대일 매핑을 말한다.

•

이러한 쌍 (U,ϕ:U→Rn)U, ϕ : U \rightarrow \mathbb{R}^n)U,ϕ:U→Rn)을 차트라고 부른다.

•

이 차트 (U,ϕ)(U, ϕ )(U,ϕ)는  ϕ(p)=0ϕ (p) = 0ϕ(p)=0일때 p∈Up ∈ Up∈U를 중심으로 한다.

•

ppp에 대한 차트 (U,ϕ)(U, ϕ )(U,ϕ)는 (U,ϕ)(U, ϕ )(U,ϕ)가 차트이고 p∈Up ∈ Up∈U라는 뜻이다.

Definition 6. Locally Euclidian property.

The locally Euclidean property means that for each

p ∈ \mathcal M

, we can find the following:

•

an open set U⊂MU ⊂ \mathcal MU⊂M containing ppp;

•

an open set U~⊂Rn\tilde U ⊂ \mathbb R^nU~⊂Rn ; and

•

a homeomorphism ϕ:U→U~ϕ : U \rightarrow \tilde Uϕ:U→U~ (i.e., a continuous bijective map with continuous inverse).

정의 6. 국소적 유클리드 성질: 이는

\mathcal{M}

의 각 점

p

에 대해, 를 포함하는 열린 집합

U \subset \mathcal{M}

, 열린 집합

\tilde{U} \subset \mathbb{R}^n

및 위상동형 (homeomorphism)

ϕ: U \rightarrow \tilde{U}

를 찾을 수 있다는 것을 의미한다.

즉,

\mathcal M

의 모든 점

p

가

\mathbb R^n

의 열린 집합과 위상동형인 네이버후드를 가진다는 뜻이다.

Definition 7. Topological manifold.

Suppose

\mathcal M

is a topological space. We say

\mathcal M

is a topological manifold of dimension

n

or a topological

n

-manifold if it has the following properties:

•

M\mathcal MM is a Hausdorff space: For every pair of points p,q∈Mp, q ∈ \mathcal Mp,q∈M, there are disjoint open subsets U,V⊂MU, V ⊂ \mathcal MU,V⊂M such that p∈Up ∈ Up∈U and q∈Vq ∈ Vq∈V .

•

M\mathcal MM is second countable: There exists a countable basis for the topology of M\mathcal MM.

•

M\mathcal MM  is locally Euclidean of dimension nnn: Every point has a neighborhood that is homeomorphic to an open subset of Rn\mathbb R^nRn.

정의 7. 위상적 다양체:

\mathcal{M}

이 위상 공간인 경우, 이

n

차원의 위상적 다양체 또는 위상적

n

-다양체라고 한다. 이는 다음과 같은 성질을 가진다:

하우스도르프 공간이다.

제 2 가산(최소한의 열린 집합 세트가 존재)이다

국소적으로 차원 유클리드 공간과 동일하다.

Definition 8. Tangent vector.

A tangent vector at a point

p

in a manifold

\mathcal M

is a derivation at

p

정의 8. 접 벡터: 접 벡터는 다양체

\mathcal{M}

의 점

p

에서의 미분이며, 이 점에서의 변화율을 나타낸다.

Definition 9. Tangent space.

As for

\mathbb R^n

, the tangent vectors at

p

form a vector space

T_p(\mathcal M)

, called the tangent space of

\mathcal M

p

. We also write

T_p

instead of

T_p (\mathcal M)

정의 9. 접 공간:

\mathbb{R}^n

에서와 마찬가지로,

p

에서의 접 벡터들은 벡터 공간

T_p(\mathcal{M})

을 형성하며, 이는

\mathcal{M}

의

p

에서의 접 공간이라고 불린다.

Definition 10. Normal space.

the normal space to

\mathcal M

p

to be the subspace

N_p \mathcal M ⊂ \mathbb R^m

consisting of all vectors that are orthogonal to

T_p

respect to the Euclidean dot product. The normal bundle of

\mathcal M

is the subset

N\mathcal M ⊂ \mathbb R^m × \mathbb R^m

defined by

\mathcal M

with

N\mathcal M = \coprod_{p ∈ \mathcal M} N_p \mathcal M = \left\{ (p, v) ∈ \mathbb R^m × \mathbb R^m : p ∈ \mathcal M \text{ and } v \in N_p \mathcal M \right\}

정의 10. 정규 공간:

\mathcal{M}

의

p

에서의 정규 공간은 유클리드 내적에 대해

T_p

와 직교하는 모든 벡터들을 포함하는

\mathbb{R}^m

의 부분공간

N_p \mathcal{M}

으로 정의된다.

\mathcal{M}

의 정규 번들은

\mathbb{R}^m \times \mathbb{R}^m

에 정의된

\mathcal{M}

의 부분집합

N\mathcal{M}

으로 정의된다.

Methodology

저자는 위 EGSDE 의 모델로부터 논의를 시작한다. 위 모델에서 적절한 guidance function

\epsilon(\mathbf{x},\mathbf{y}_0,t)

를 선택해야 할 것입니다. 한편, 최신 연구 [Zhao et al. 2022, Bao et al. 2022b] 에서는 guidance function을 다음과 같은 형태를 사용한 것에 주목한다다.

\epsilon(\mathbf{x},\mathbf{y}_0,t)=\lambda_1\epsilon_1(\mathbf{x},\mathbf{y}_0,t)+\lambda_2\epsilon_2(\mathbf{x},\mathbf{y}_0,t)

이를 Score-Decomposed Diffusion Model이라고 한다.

그럼, Unpaired I2I를 위해서 어떻게 에너지 함수를 설계해야 할까?

Model Overview

To explicitly optimize the tangled distributions during image generation, we use moments of the perturbed reference image y0\mathbf y_0y0​ as constraints for constructing separable manifolds, thus disentangling the distributions of adjacent time steps.

•

timestep ttt와 t−1t-1t−1에 대해, 두 매니폴드 Mt\mathcal{M}_tMt​와 Mt−1\mathcal{M}_{t-1}Mt−1​는 separable.

•

따라서 conditional distribution xt\mathbf{x}_txt​와 xt−1\mathbf{x}_{t-1}xt−1​ 역시 separable.

•

이 때 매니폴드 Mt\mathcal{M}_tMt​는 score function s(x,t)\mathbf{s}(\mathbf{x},t)s(x,t)를  두 컴포넌트로 분해함.

◦

sr(x,t)\mathbf{s}_r(\mathbf{x},t)sr​(x,t): content refinement part.

◦

sd(x,t)\mathbf{s}_d(\mathbf{x},t)sd​(x,t): Denoising part.

•

마찬가지로 ∇xϵ1(x,y0,t)\nabla_\mathbf{x}\epsilon_1(\mathbf{x},\mathbf{y}_0,t)∇x​ϵ1​(x,y0​,t) 에서도 접공간 (tangent space) TxtMtT_{\mathbf{x}_t}\mathcal{M}_tTxt​​Mt​ 평면 상에 존재하는 ∇xϵ1r(x,y0,t)\nabla_\mathbf{x}\epsilon_{1r}(\mathbf{x},\mathbf{y}_0,t)∇x​ϵ1r​(x,y0​,t) 를 분해해 낼 수 있음.

•

따라서, optimization process는 다음과 같은 2단계로 이루어질 수 있음.

◦

1단계: Manifold Mt\mathcal{M}_tMt​ 위에서 optimize.

▪

tangent space 상의 optimal direction인 MOO라는 빨간 벡터를 얻기 위해 알고리즘을 설계했다고 함.

◦

2단계: 다음 Manifold Mt−1\mathcal{M}_{t-1}Mt−1​ 로 적절하게 Mapping.

▪

이렇게 얻은 MOO 벡터를 위 식에 대입해 dx를 구하고, xt+MOO\mathbf{x}_t+\text{MOO}xt​+MOO를 다음 Manifold 로 mapping함.

▪

이때 δ\deltaδ는 consistency of form을 위한 Manifold Mt−1\mathcal{M}_{t-1}Mt−1​ 의 restriction이 됨.

언뜻 간단한데, 과연 어떻게 잘 동작하게될까?

이제 논문의 3.2는 수학적인 검증을 하게된다.

Decomposition of the Score and Energy Guidance

Score function이 다음과 같이 주어졌다고 해보자.

\mathbf{s}(\mathbf{x})=\nabla_\mathbf{x}\log p(\mathbf{x})\in\mathbb{R}^d

매니폴드

\mathcal{M}

이 smooth, compact한

\mathbb{R}^d

의 submanifold라고 가정하자. 그리고

p_\mathcal{M}(\mathbf{x})

은 이 매니폴드에 속한 (restriced) 확률분포이다. 그렇다면 다음과 같은 정의를 할 수 있다.

Definition 1. The tangent score function sr(x)\mathbf{s}_r(\mathbf{x})sr​(x).

\mathbf{s}_r(\mathbf{x}):=\nabla_\mathbf{x}\log p_\mathcal{M}(\mathbf{x})

즉,

\mathbf{s}_r(\mathbf{x})

는 매니폴드

\mathcal{M}

위에 존재한다. 만약 매니폴드 시리즈

\{\mathcal{M}_t\}

와 원래 score function

\mathbf{s}(\mathbf{x},t)

가 있다면,

\mathbf{s}_r(\mathbf{x},t)

는 매니폴드

\mathcal{M}_t

위의 tangent score function이다.

Definition 2. The tangent score function sd(x)\mathbf{s}_d(\mathbf{x})sd​(x).

\mathbf{s}_d(\mathbf{x}) := \mathbf{s}(\mathbf{x})|_{N_\mathbf{x}\mathcal{M}}

즉,

\mathbf{s}_d(\mathbf{x})

는 매니폴드의 normal space 위의 score function이다. 마찬가지로

\mathbf{s}_d(\mathbf{x},t)

는 매니폴드

\mathcal{M}_t

위의 normal score function이다.

다음으로 다음과 같은 score function decomposition을 얻을 수 있다.

Lemma 1. s(x)=sr(x)+sd(x)\mathbf{s}(\mathbf{x})=\mathbf{s}_r(\mathbf{x})+\mathbf{s}_d(\mathbf{x})s(x)=sr​(x)+sd​(x)

이는

\mathbf{s}_r(\mathbf{x}) = \mathbf{s}(\mathbf{x})|_{T_\mathbf{x}\mathcal{M}}

를 알 때 파생될 수 있다.

일반적으로 이 Decomposition은 큰 의미가 없는데, 인접한 시간 단계들의 매니폴드들이 서로 결합되어 있기 때문이다. 이전 연구에서 전체

\cup_t \mathbf{x}_t

를 entire-manifold (Liu et al. 2022) 혹은 strong assumption (Chung et al. 2022) 으로 다루었다. 하지만, conditional generation task에서는 reference image

\mathbf{y}_0

는 서로 다른 timestep에서 compact manifold를 제공할 수 있고 인접한 timestep의 manifold는 분리될 수 있다. 이런 관점에서 tangent score function은 매니폴드 상에서 refinement part로 간주될 수 있다. 그렇다면 normal score function은 매니폴드와 인접한 timestep간의 mappinf function이 된다.

따라서 manifold를 기술하는 proposition을 쓸 수 있다.

Proposition 1. At time step ttt, for any single reference image y0\mathbf{y}_0y0​ , the perturbed distribution qt∣0(yt∣y0)q_{t|0}(\mathbf{y}_t|\mathbf{y}_0)qt∣0​(yt​∣y0​) is concentrated on a compact manifold Mt⊂Rd\mathcal{M}_t⊂\mathbb{R}^dMt​⊂Rd and the dimension of Mt≤d−2\mathcal{M}_t \leq d-2Mt​≤d−2 when ddd is large enough. Suppose the distributions of perturbed reference image yt=α^ty0+β^tzty_t = \hat α_t\mathbf{y}_0 + \hat{β}_t \mathbf{z}_tyt​=α^t​y0​+β^​t​zt​ , where zt∼N(0,I)\mathbf{z}_t \sim \mathcal{N}(0, I)zt​∼N(0,I). The following statistical constraints define such (d−2)−dim Mt(d-2)-dim\ \mathcal{M}_t (d−2)−dim Mt​.

\begin{align} \mu[\mathbf{x}_t]&=\hat \alpha_t\mu[\mathbf{y}_0] \\ \text{Var}[\mathbf{x}_t]&=\hat\alpha_t^2\text{Var}[\mathbf{y}_0]+\hat\beta^2_t. \end{align}

이것을 풀어쓰자면, forward process로 얻어진 확률분포

q_{t|0}(\mathbf{y}_t|\mathbf{y}_0)

은 콤팩트 매니폴드

\mathcal{M}_t⊂\mathbb{R}^d

상에 집중되고 매니폴드

\mathcal{M}_t

의 차원은

(d-2)

보다 작다.

y_t = \hat α_t\mathbf{y}_0 + \hat{β}_t \mathbf{z}_t

이므로,

\mathbf{x}_t

에 대한 평균과 편차에 대한 제약이 매니폴드

\mathcal{M}_t

를 정의한다는 것이다. 이렇게 정의된 매니폴드는

(d-2)

차원이므로

\mathbb{R}^d

에 비해 차원이 적다. 이에 더해서, 차원을 더 낮추기 위해 ‘chunking trick’을 사용한다. 따라서 이러한 매니폴드를 사용하여 통계를 유지를 나타낼 수 있는데, 이는 tangnt space

T_{\mathbf{x}_t}\mathcal{M}_t

이 "정제" 부분을 잘 분리 (separate) 할 수 있음을 나타낸다.

다음으로, 인접한 timestep에서의

\mathbf{y}_t

와

\mathbf{y}_{t-1}

또한 잘 분리될 수 있다는 Lemma 2를 기술한다.

Lemma 2. With the Mt\mathcal{M}_tMt​ defined in Proposition 1, assume t≠t′t \neq t't=t′ , Then Mt\mathcal{M}_tMt​ and Mt′\mathcal{M}_t'Mt′​ can be well separated. Rigorously, ∀ε>0∀ε > 0∀ε>0, ∃Md∃\mathcal{M}_d∃Md​ divide the Rd\mathbb{R}^dRd into two disconnect spaces A,B\mathcal{A}, \mathcal{B}A,B  where Mt∈A\mathcal{M}_t \in \mathcal{A}Mt​∈A and Mt∈B\mathcal{M}_t \in \mathcal{B}Mt​∈B.

한글로 풀어쓰자면 다음과 같다.

t\neq t’

일 때 proposition 1에서 정의된 매니폴드

\mathcal{M}_t, \mathcal{M}_{t'}

는 잘 분리될 수 있다. 엄밀하게 말해, 모든

\epsilon>0

에 대해,

\mathbb{R}^d

를 두개의 연결되지 않은 공간 (disconnect spaces)

\mathcal{A}, \mathcal{B}

로 나누는 어떤

\mathcal{M}_d

가 존재하고 이는

\mathcal{A}, \mathcal{B}

에 속한다. (

\mathcal{M}_t \in \mathcal{A}

and

\mathcal{M}_t \in \mathcal{B}

따라서,

\mathcal{M}_t

를 사용해 score function

\mathbf{s}

를 근사적으로

\mathbf{s}_r, \mathbf{s}_d

로 decompose할 수 있다. 좀 더 일반적으로 말해, 최적화공간 (optimization space)과 tangnt space

T_{\mathbf{x}_t}\mathcal{M}_t

을 분리 (decouple) 할 수 있다.

T_{\mathbf{x}_t}\mathcal{M}_t

를 이용해 SBDM의 score function과 energy 를 더 세밀하게 조작할 수 있다. 또한 “Refinement” 파트를 분리함으로써, score function의 “denoising” 파트가 지나치게 방해되는 것을 방지할 수 있다.

Stage 1: Optimization on Manifold.

Definition 3. Manifold Optimization. 

Manifold optimization (Hu et al., 2020)은 실수 함수

f(\mathbf{x})

를 주어진 리만 다양체

\mathcal{M}

위에서 최적화하는 문제이다. Optimized target은 다음과 같다.

\min_{\mathbf{x}\in \mathcal{M}}f(\mathbf{x})

주어진

t

에 대한 score function

\mathbf{s}(\mathbf{x},t)

가

\nabla_\mathbf{x}\log q_t(\mathbf{x})

의 근사이므로 이제

\log q_t(\mathbf{x})

를

\mathbf{s}(\mathbf{x},t)

의 potential energy가 된다. 그러므로 에너지 함수의 guidance가 된다.

Definition 4. Pareto optimality on the manifold.

\mathbf{x}_t, \hat{\mathbf{x}}_t \in \mathcal{M}_t

라고 할 때,

•

만약 모든 iii에 대하여 ϵir(xt,y0,t)≤ϵir(x^t,y0,t)\epsilon_{ir}(\mathbf{x}_t,\mathbf{y}_0,t)\leq
\epsilon_{ir}(\hat{\mathbf{x}}_t,\mathbf{y}_0,t)ϵir​(xt​,y0​,t)≤ϵir​(x^t​,y0​,t)이고 sr(x^t,t)≤sr(xt,t)\mathbf{s}_r(\hat{\mathbf{x}}_t,t)\leq\mathbf{s}_r(\mathbf{x}_t,t)sr​(x^t​,t)≤sr​(xt​,t)이면 xt\mathbf{x}_txt​는 x^t\hat{\mathbf{x}}_tx^t​를 지배(dominate) 한다.

•

만약 xt\mathbf{x}_txt​를 dominate하는 해 x^t\hat{\mathbf{x}}_tx^t​가 존재하지 않는다면, 그 해 xt\mathbf{x}_txt​는 파레토 최적이라고 불린다.

여기서 ‘지배’ (dominant)란 무엇일까? 이는 두 솔루션간의 비교평가에 대한 기술이다.

\mathbf{x}_t

는

\hat{\mathbf{x}}_t

를 지배한다는 뜻은 해

\hat{\mathbf{x}}_t

가 objective에서 적어도

\mathbf{x}_t

만큼 좋고 적어도 하나의 objective에 대해 분명하게 나은 경우를 말한다. 즉

\mathbf{x}_t

에 비해

\hat{\mathbf{x}}_t

에서 더 나쁜 objective는 없으며, 적어도 하나의 objective에 대해서 더 낫다는 말이다. 이러한

\hat{\mathbf{x}}_t

를 찾지 못할 경우

\mathbf{x}_t

는 파레토 최적인데, 이는 모든 objective에 대해 더 나은 솔루션이 없다는 뜻이다.

그렇다면, 다목적 최적화의 목표는 파레토 최적 해결책을 찾는 것이다. 지역 파레토 최적성은 단일 목적 최적화처럼 경사 하강법을 통해 달성될 수도 있다. 우리는 다중 경사 하강 알고리즘(MGDA) (Désidéri, 2012)을 따른다. MGDA는 또한 다목적 최적화를 위한 카루시-쿤-터커(KKT) 조건을 활용하는데, 이 방법에서는 다음과 같다.

Therom 1. K.K.T conditions on a smooth manifold.
At time step ttt on the tangent space  TxtMtT_{\mathbf{x}_t}\mathcal{M}_tTxt​​Mt​, there ∃α,β1,β2,...,βm≥0∃α, β_1 , β_2 , ..., β_m ≥ 0 ∃α,β1​,β2​,...,βm​≥0 such that α+∑i=1mβi=1α + ∑_{i=1}^m β^i = 1α+∑i=1m​βi=1 and αsr(xt,t)=∑i=1mβi∇xtεir(xt,y0,t)α\mathbf s_r (\mathbf x_t , t) = ∑_{i=1}^m β^i ∇_{x_t} ε_{ir} (\mathbf x_t , \mathbf{y}_0 , t)αsr​(xt​,t)=∑i=1m​βi∇xt​​εir​(xt​,y0​,t), where sr(xt,t)\mathbf s_r (\mathbf x_t , t)sr​(xt​,t) are the fractions of s(xt,t)\mathbf s(\mathbf x_t , t)s(xt​,t) on the tangent space and εir(xt,y0,t) ε_{ir} (\mathbf x_t , \mathbf y_0 , t) εir​(xt​,y0​,t)are functions restricted on the manifold Mt\mathcal M_tMt​.

이 Theorem 1은 특정 time step

t

에서의 매니폴드 접선공간 (tangent space)의 시나리오를 설명한다.

ε_{ir} (\mathbf x_t , \mathbf y_0 , t)

은 매니폴드

\mathcal M_t

상에 제약되어있고,

\mathbf s_r (\mathbf x_t , t)

은

\mathbf s(\mathbf x_t , t)

의 접선공간 성분일때, 다음을 만족하는

α, β_1 , β_2 , ..., β_m ≥ 0

가 존재한다.

•

α+∑i=1mβi=1α + ∑_{i=1}^m β^i = 1α+∑i=1m​βi=1

•

αsr(xt,t)=∑i=1mβi∇xtεir(xt,y0,t)α\mathbf s_r (\mathbf x_t , t) = ∑_{i=1}^m β^i ∇_{x_t} ε_{ir} (\mathbf x_t , \mathbf{y}_0 , t)αsr​(xt​,t)=∑i=1m​βi∇xt​​εir​(xt​,y0​,t)

위와 같은 조건을 만족하는 모든 점은 Parato stationary point라고 부른다. 모든 Pareto optimal point는 Pareto stationary point가 되고, 그 역은 성립하지 않는다. 이 문제의 해는 다음과 같다. (Desideri, 2012)

\min_{ \substack{ \alpha,\beta^1,\cdots,\beta^m \geq 0 \\ \alpha+\beta^1+\cdots\beta^m=1} } \left\{ \|{ \alpha \mathbf s_r(\mathbf x_t,t) - \sum^m_{i=1}\beta^i\nabla_{\mathbf x_t} \epsilon_{ir}(\mathbf x_t, \mathbf y_t, t) }\| \right\}

이를 통해 decent direction을 구할 수 있고 Pareto stationary point를 얻을 수 있다. 저자는 먼저 모든 gradient를 normalize했다.

Image to Image task에서, 많은 timestep에 대한 manifold

\mathcal M_t

가 있으므로 작은

\epsilon

에 대해 Pareto stationary point를

\mathbf B_\epsilon (\mathbf x_t) \cap \mathcal M_t

에서 찾을 수 있다.

\mathbf B_\epsilon (\mathbf x_t)

는 중심이

\mathbf x_t

이고 지름이

\epsilon

인 open ball이다.

최종 알고리즘은 다음과 같다.

Stage 2. Transformation between adjacent manifolds.

manifold

\mathcal M_t

에 대해 optimization을 했다면, 이제

\mathbf x_t

를 dominate 하는

\mathbf x^\star_t

를 구했다. 이제 score function

\mathbf s_d(\mathbf x^\star,t)

의 “denoising” part

\mathbf f(\mathbf x^\star_t,t)

, reverse-time noise와 rescriction function을 이용해

\mathcal M_{t-1}

에서

\mathcal M_{t}

로 mapping할 수 있다.

먼저, 인접 맵핑에 대한 성질을 다음 proposition으로 기술한다.

proposition 2. Suppose the f(⋅,⋅)\mathbf f(·, ·)f(⋅,⋅) is affine. Then the adjacent map has the following properties:

•

∃!vxt∈NxtMt\exists!\mathbf v_{x_t} \in N_{\mathbf  x_t} \mathcal M_t∃!vxt​​∈Nxt​​Mt​ that xt+vxt∈Mt−1.\mathbf  x_t + \mathbf v_{\mathbf x_t} \in \mathcal M_{t-1}.xt​+vxt​​∈Mt−1​.

◦

정규공간 (Normal Manifold) NxtMtN_{\mathbf x_t} \mathcal M_tNxt​​Mt​에 속하는 vxt\mathbf v_{x_t} vxt​​가 유일하게 존재하고, xt\mathbf x_txt​와 더해지면 매니폴드 Mt−1\mathcal M_{t-1}Mt−1​에 존재한다.

•

NxtMt=Nxt+vxtMt−1N_{\mathbf x_t} \mathcal M_t =N_{\mathbf x_t +\mathbf v_{\mathbf x_t}} \mathcal M_{t−1}Nxt​​Mt​=Nxt​+vxt​​​Mt−1​.

◦

매니폴드 Mt\mathcal M_tMt​에서 xt\mathbf x_txt​의 정규공간이 매니폴드 Mt−1\mathcal M_{t-1}Mt−1​에서 xt+vxt\mathbf x_t +\mathbf v_{\mathbf x_t}xt​+vxt​​의 정규공간과 동일하다.

•

vxt\mathbf v_{\mathbf x_t}vxt​​ is a transition map from TxtMtT_{\mathbf x_t} \mathcal M_tTxt​​Mt​ to Txt+vMt−1T_{\mathbf{x}_t+\mathbf v} \mathcal M_{t−1}Txt​+v​Mt−1​.

◦

vxt\mathbf v_{\mathbf x_t}vxt​​는 매니폴드 Mt\mathcal M_tMt​의 접선공간 TxtMtT_{\mathbf x_t} \mathcal M_tTxt​​Mt​에서 매니폴드 Mt−1\mathcal M_{t-1}Mt−1​의 접선공간 Txt+vMt−1T_{\mathbf{x}_t+\mathbf v} \mathcal M_{t−1}Txt​+v​Mt−1​으로 이동하는 전이 맵이다.

•

vxt\mathbf v_{\mathbf x_t}vxt​​ is determined with f(⋅,⋅)\mathbf f(·, ·)f(⋅,⋅), xt\mathbf x_t xt​y0 \mathbf y_0 y0​ and g(⋅)g(·)g(⋅).

◦

vxt\mathbf v_{\mathbf x_t}vxt​​는 점 xt\mathbf x_t xt​, y0 \mathbf y_0 y0​와 함수 f(⋅,⋅),g(⋅)\mathbf f(·, ·), g(·)f(⋅,⋅),g(⋅)에 의해 결정된다. 즉, vxt\mathbf v_{\mathbf x_t}vxt​​가 이런 입력에 의해 계산될 수 있다.

하지만,

\mathbf v_{\mathbf x_t}

를 adjecent map으로 사용하면 reverse-time noise와

\mathbf s_d

의 영향을 잃게된다. 따라서 reverse SDE에 따라 normal space

N_{\mathbf x_t}\mathcal{M}_t

상에 있는 추가 항과

\mathcal M_{t-1}

상의 제한함수를 adjacent map으로 사용하는데, 이를

\mathbf v^\star_{\mathbf x_t}

라고 표시한다.

이제 이미지를 생성하는 최종 알고리즘을 얻을 수 있다.