Viet-Anh on Software Logo

What is: Path Length Regularization?

SourceAnalyzing and Improving the Image Quality of StyleGAN
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Path Length Regularization is a type of regularization for generative adversarial networks that encourages good conditioning in the mapping from latent codes to images. The idea is to encourage that a fixed-size step in the latent space W\mathcal{W} results in a non-zero, fixed-magnitude change in the image.

We can measure the deviation from this ideal empirically by stepping into random directions in the image space and observing the corresponding w\mathbf{w} gradients. These gradients should have close to an equal length regardless of w\mathbf{w} or the image-space direction, indicating that the mapping from the latent space to image space is well-conditioned.

At a single wW\mathbf{w} \in \mathcal{W} the local metric scaling properties of the generator mapping g(w):WYg\left(\mathbf{w}\right) : \mathcal{W} \rightarrow \mathcal{Y} are captured by the Jacobian matrix J_w=δg(w)/δw\mathbf{J\_{w}} = \delta{g}\left(\mathbf{w}\right)/\delta{\mathbf{w}}. Motivated by the desire to preserve the expected lengths of vectors regardless of the direction, we formulate the regularizer as:

E_w,yN(0,I)(JT_wy_2a)2\mathbb{E}\_{\mathbf{w},\mathbf{y} \sim \mathcal{N}\left(0, \mathbf{I}\right)} \left(||\mathbf{J}^{\mathbf{T}}\_{\mathbf{w}}\mathbf{y}||\_{2} - a\right)^{2}

where yy are random images with normally distributed pixel intensities, and wf(z)w \sim f\left(z\right), where zz are normally distributed.

To avoid explicit computation of the Jacobian matrix, we use the identity JT_wy=_w(g(w)y)\mathbf{J}^{\mathbf{T}}\_{\mathbf{w}}\mathbf{y} = \nabla\_{\mathbf{w}}\left(g\left(\mathbf{w}\right)·y\right), which is efficiently computable using standard backpropagation. The constant aa is set dynamically during optimization as the long-running exponential moving average of the lengths JT_wy_2||\mathbf{J}^{\mathbf{T}}\_{\mathbf{w}}\mathbf{y}||\_{2}, allowing the optimization to find a suitable global scale by itself.

The authors note that they find that path length regularization leads to more reliable and consistently behaving models, making architecture exploration easier. They also observe that the smoother generator is significantly easier to invert.