Viet-Anh on Software Logo

What is: Content-Conditioned Style Encoder?

SourceCOCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Content-Conditioned Style Encoder, or COCO, is a style encoder used for image-to-image translation in the COCO-FUNIT architecture. Unlike the style encoder in FUNIT, COCO takes both content and style image as input. With this content conditioning scheme, we create a direct feedback path during learning to let the content image influence how the style code is computed. It also helps reduce the direct influence of the style image to the extract style code.

The bottom part of the Figure details architecture. First, the content image is fed into an encoder E_S,CE\_{S, C} to compute a spatial feature map. This content feature map is then mean-pooled and mapped to a vector ζ_c.\zeta\_{c} . Similarly, the style image is fed into encoder E_S,SE\_{S, S} to compute a spatial feature map. The style feature map is then mean-pooled and concatenated with an input-independent bias vector: the constant style bias (CSB). Note that while the regular bias in deep networks is added to the activations, in CSB, the bias is concatenated with the activations. The CSB provides a fixed input to the style encoder, which helps compute a style code that is less sensitive to the variations in the style image.

The concatenation of the style vector and the CSB is mapped to a vector ζ_s\zeta\_{s} via a fully connected layer. We then perform an element-wise product operation to ζ_c\zeta\_{c} and ζ_s\zeta\_{s}, which is the final style code. The style code is then mapped to produce the AdaIN parameters for generating the translation. Through this element-wise product operation, the resulting style code is heavily influenced by the content image. One way to look at this mechanism is that it produces a customized style code for the input content image.

The COCO is used as a drop-in replacement for the style encoder in FUNIT. Let ϕ\phi denote the COCO mapping. The translation output is then computed via

z_c=E_c(xc),zs=ϕ(E_s,s(xs),E_s,c(x_c)),x=F(z_c,z_s)z\_{c}=E\_{c}\left(x_{c}\right), z_{s}=\phi\left(E\_{s, s}\left(x_{s}\right), E\_{s, c}\left(x\_{c}\right)\right), \overline{\mathbf{x}}=F\left(z\_{c}, z\_{s}\right)

The style code extracted by the COCO is more robust to variations in the style image. Note that we set E_S,CE_CE\_{S, C} \equiv E\_{C} to keep the number of parameters in our model similar to that in FUNIT.