**Pix2Pix** is a conditional image-to-image translation architecture that uses a conditional [GAN](https://paperswithcode.com/method/gan) objective combined with a reconstruction loss. The conditional GAN objective for observed images $x$, output images $y$ and the random noise vector $z$ is:

$$ \mathcal{L}\_{cGAN}\left(G, D\right) =\mathbb{E}\_{x,y}\left[\log D\left(x, y\right)\right]+
\mathbb{E}\_{x,z}\left[log(1 − D\left(x, G\left(x, z\right)\right)\right] $$

We augment this with a reconstruction term:

$$ \mathcal{L}\_{L1}\left(G\right) = \mathbb{E}\_{x,y,z}\left[||y - G\left(x, z\right)||\_{1}\right] $$

and we get the final objective as:

$$ G^{*} = \arg\min\_{G}\max\_{D}\mathcal{L}\_{cGAN}\left(G, D\right) + \lambda\mathcal{L}\_{L1}\left(G\right) $$

The architectures employed for the generator and discriminator closely follow [DCGAN](https://paperswithcode.com/method/dcgan), with a few modifications:

- Concatenated skip connections are used to "shuttle" low-level information between the input and output, similar to a [U-Net](https://paperswithcode.com/method/u-net).
- The use of a [PatchGAN](https://paperswithcode.com/method/patchgan) discriminator that only penalizes structure at the scale of patches.

**FSAF**, or Feature Selective Anchor-Free, is a building block for single-shot object detectors. It can be plugged into single-shot detectors with feature pyramid structure. The FSAF module addresses two limitations brought up by the conventional anchor-based detection: 1) heuristic-guided feature selection; 2) overlap-based anchor sampling. The general concept of the FSAF module is online feature selection applied to the training of multi-level anchor-free branches. Specifically, an anchor-free branch is attached to each level of the feature pyramid, allowing box encoding and decoding in the anchor-free manner at an arbitrary level. During training, we dynamically assign each instance to the most suitable feature level. At the time of inference, the FSAF module can work jointly with anchor-based branches by outputting predictions in parallel. We instantiate this concept with simple implementations of anchor-free branches and online feature selection strategy

The general concept is presented in the Figure to the right. An anchor-free branch is built per level of feature pyramid, independent to the anchor-based branch. Similar to the anchor-based branch, it consists of a classification subnet and a regression subnet (not shown in figure). An instance can be assigned to arbitrary level of the anchor-free branch. During training, we dynamically select the most suitable level of feature for each instance based on the instance content instead of just the size of instance box. The selected level of feature then learns to detect the assigned instances. At inference, the FSAF module can run independently or jointly with anchor-based branches. The FSAF module is agnostic to the backbone network and can be applied to single-shot detectors with a structure of feature pyramid. Additionally, the instantiation of anchor-free branches and online feature selection can be various.

FSAF

Feature Selective Anchor-Free Module for Single-Shot Object Detection

Pix2Pix

Image-to-Image Translation with Conditional Adversarial Networks

**Auxiliary Classifiers** are type of architectural component that seek to improve the convergence of very deep networks. They are classifier heads we attach to layers before the end of the network. The motivation is to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combatting the vanishing gradient problem. They are notably used in the Inception family of convolutional neural networks.

Source	Image-to-Image Translation with Conditional Adversarial Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com