**Wasserstein GAN + Gradient Penalty**, or **WGAN-GP**, is a generative adversarial network that uses the Wasserstein loss formulation plus a gradient norm penalty to achieve Lipschitz continuity.

The original [WGAN](https://paperswithcode.com/method/wgan) uses weight clipping to achieve 1-Lipschitz functions, but this can lead to undesirable behaviour by creating pathological value surfaces and capacity underuse, as well as gradient explosion/vanishing without careful tuning of the weight clipping parameter $c$.

A Gradient Penalty is a soft version of the Lipschitz constraint, which follows from the fact that functions are 1-Lipschitz iff the gradients are of norm at most 1 everywhere. The squared difference from norm 1 is used as the gradient penalty.

A **Fire Module** is a building block for convolutional neural networks, notably used as part of [SqueezeNet](https://paperswithcode.com/method/squeezenet). A Fire module is comprised of: a squeeze [convolution](https://paperswithcode.com/method/convolution) layer (which has only 1x1 filters), feeding into an expand layer that has a mix of 1x1 and 3x3 convolution filters.  We expose three tunable dimensions (hyperparameters) in a Fire module: $s\_{1x1}$, $e\_{1x1}$, and $e\_{3x3}$. In a Fire module, $s\_{1x1}$ is the number of filters in the squeeze layer (all 1x1), $e\_{1x1}$ is the number of 1x1 filters in the expand layer, and $e\_{3x3}$ is the number of 3x3 filters in the expand layer. When we use Fire modules we set $s\_{1x1}$ to be less than ($e\_{1x1}$ + $e\_{3x3}$), so the squeeze layer helps to limit the number of input channels to the 3x3 filters.

Fire Module

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

WGAN GP

Improved Training of Wasserstein GANs

**SCARLET-NAS** is a type of [neural architecture search](https://paperswithcode.com/method/neural-architecture-search) that utilises a learnable stabilizer to calibrate feature deviation, named the Equivariant Learnable Stabilizer (ELS). Previous one-shot approaches can be limited by fixed-depth search spaces. With SCARLET-NAS, we use the equivariant learnable stabilizer on each skip connection. This can lead to improved convergence, more reliable evaluation, and retained equivalence. The third benefit is deemed most important by the authors for scalability.

Source	Improved Training of Wasserstein GANs
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Wasserstein GAN (Gradient Penalty)?

Viet-Anh on Software