Viet-Anh on Software Logo

What is: CycleGAN?

SourceUnpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

CycleGAN, or Cycle-Consistent GAN, is a type of generative adversarial network for unpaired image-to-image translation. For two domains XX and YY, CycleGAN learns a mapping G:XYG : X \rightarrow Y and F:YXF: Y \rightarrow X. The novelty lies in trying to enforce the intuition that these mappings should be reverses of each other and that both mappings should be bijections. This is achieved through a cycle consistency loss that encourages F(G(x))xF\left(G\left(x\right)\right) \approx x and G(F(y))yG\left(F\left(y\right)\right) \approx y. Combining this loss with the adversarial losses on XX and YY yields the full objective for unpaired image-to-image translation.

For the mapping G:XYG : X \rightarrow Y and its discriminator D_YD\_{Y} we have the objective:

L_GAN(G,D_Y,X,Y)=E_yp_data(y)[logD_Y(y)]+E_xp_data(x)[log(1D_Y(G(x))]\mathcal{L}\_{GAN}\left(G, D\_{Y}, X, Y\right) =\mathbb{E}\_{y \sim p\_{data}\left(y\right)}\left[\log D\_{Y}\left(y\right)\right] + \mathbb{E}\_{x \sim p\_{data}\left(x\right)}\left[log(1 − D\_{Y}\left(G\left(x\right)\right)\right]

where GG tries to generate images G(x)G\left(x\right) that look similar to images from domain YY, while D_YD\_{Y} tries to discriminate between translated samples G(x)G\left(x\right) and real samples yy. A similar loss is postulated for the mapping F:YXF: Y \rightarrow X and its discriminator D_XD\_{X}.

The Cycle Consistency Loss reduces the space of possible mapping functions by enforcing forward and backwards consistency:

L_cyc(G,F)=E_xp_data(x)[F(G(x))x_1]+E_yp_data(y)[G(F(y))y_1]\mathcal{L}\_{cyc}\left(G, F\right) = \mathbb{E}\_{x \sim p\_{data}\left(x\right)}\left[||F\left(G\left(x\right)\right) - x||\_{1}\right] + \mathbb{E}\_{y \sim p\_{data}\left(y\right)}\left[||G\left(F\left(y\right)\right) - y||\_{1}\right]

The full objective is:

L_GAN(G,F,D_X,D_Y)=L_GAN(G,D_Y,X,Y)+L_GAN(F,D_X,X,Y)+λL_cyc(G,F)\mathcal{L}\_{GAN}\left(G, F, D\_{X}, D\_{Y}\right) = \mathcal{L}\_{GAN}\left(G, D\_{Y}, X, Y\right) + \mathcal{L}\_{GAN}\left(F, D\_{X}, X, Y\right) + \lambda\mathcal{L}\_{cyc}\left(G, F\right)

Where we aim to solve:

G\*,F\*=argmin_G,Fmax_D_X,D_YL_GAN(G,F,D_X,D_Y)G^{\*}, F^{\*} = \arg \min\_{G, F} \max\_{D\_{X}, D\_{Y}} \mathcal{L}\_{GAN}\left(G, F, D\_{X}, D\_{Y}\right)

For the original architecture the authors use:

  • two stride-2 convolutions, several residual blocks, and two fractionally strided convolutions with stride 12\frac{1}{2}.
  • instance normalization
  • PatchGANs for the discriminator
  • Least Square Loss for the GAN objectives.