Viet-Anh on Software Logo

What is: LAPGAN?

SourceDeep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A LAPGAN, or Laplacian Generative Adversarial Network, is a type of generative adversarial network that has a Laplacian pyramid representation. In the sampling procedure following training, we have a set of generative convnet models {G_0,,G_KG\_{0}, \dots , G\_{K}}, each of which captures the distribution of coefficients h_kh\_{k} for natural images at a different level of the Laplacian pyramid. Sampling an image is akin to a reconstruction procedure, except that the generative models are used to produce the h_kh\_{k}’s:

I~_k=u(I~_k+1)+h~_k=u(I~_k+1)+G_k(z_k,u(I~_k+1)) \tilde{I}\_{k} = u\left(\tilde{I}\_{k+1}\right) + \tilde{h}\_{k} = u\left(\tilde{I}\_{k+1}\right) + G\_{k}\left(z\_{k}, u\left(\tilde{I}\_{k+1}\right)\right)

The recurrence starts by setting I~_K+1=0\tilde{I}\_{K+1} = 0 and using the model at the final level G_KG\_{K} to generate a residual image I~_K\tilde{I}\_{K} using noise vector z_Kz\_{K}: I~_K=G_K(z_K)\tilde{I}\_{K} = G\_{K}\left(z\_{K}\right). Models at all levels except the final are conditional generative models that take an upsampled version of the current image I~_k+1\tilde{I}\_{k+1} as a conditioning variable, in addition to the noise vector z_kz\_{k}.

The generative models {G_0,,G_KG\_{0}, \dots, G\_{K}} are trained using the CGAN approach at each level of the pyramid. Specifically, we construct a Laplacian pyramid from each training image II. At each level we make a stochastic choice (with equal probability) to either (i) construct the coefficients h_kh\_{k} either using the standard Laplacian pyramid coefficient generation procedure or (ii) generate them using $G_{k}:

h~_k=G_k(z_k,u(I_k+1))\tilde{h}\_{k} = G\_{k}\left(z\_{k}, u\left(I\_{k+1}\right)\right)

Here G_kG\_{k} is a convnet which uses a coarse scale version of the image l_k=u(I_k+1)l\_{k} = u\left(I\_{k+1}\right) as an input, as well as noise vector z_kz\_{k}. D_kD\_{k} takes as input h_kh\_{k} or h~_k\tilde{h}\_{k}, along with the low-pass image l_kl\_{k} (which is explicitly added to h_kh\_{k} or h~_k\tilde{h}\_{k} before the first convolution layer), and predicts if the image was real or generated. At the final scale of the pyramid, the low frequency residual is sufficiently small that it can be directly modeled with a standard GAN: h~_K=G_K(z_K)\tilde{h}\_{K} = G\_{K}\left(z\_{K}\right) and D_KD\_{K} only has h_Kh\_{K} or h~_K\tilde{h}\_{K} as input.

Breaking the generation into successive refinements is the key idea. We give up any “global” notion of fidelity; an attempt is never made to train a network to discriminate between the output of a cascade and a real image and instead the focus is on making each step plausible.