**GradientDICE** is a density ratio learning method for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. It optimizes a different objective from [GenDICE](https://arxiv.org/abs/2002.09072) by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.

**Conditional Batch Normalization (CBN)** is a class-conditional variant of [batch normalization](https://paperswithcode.com/method/batch-normalization). The key idea is to predict the $\gamma$ and $\beta$ of the batch normalization from an embedding - e.g. a language embedding in VQA. CBN enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off. CBN has also been used in [GANs](https://paperswithcode.com/methods/category/generative-adversarial-networks) to allow class information to affect the batch normalization parameters.

Consider a single convolutional layer with batch normalization module $\text{BN}\left(F\_{i,c,h,w}|\gamma\_{c}, \beta\_{c}\right)$ for which pretrained scalars $\gamma\_{c}$ and $\beta\_{c}$ are available. We would like to directly predict these affine scaling parameters from, e.g., a language embedding $\mathbf{e\_{q}}$. When starting the training procedure, these parameters must be close to the pretrained values to recover the original [ResNet](https://paperswithcode.com/method/resnet) model as a poor initialization could significantly deteriorate performance. Unfortunately, it is difficult to initialize a network to output the pretrained $\gamma$ and $\beta$. For these reasons, the authors propose to predict a change $\delta\beta\_{c}$ and $\delta\gamma\_{c}$ on the frozen original scalars, for which it is straightforward to initialize a neural network to produce an output with zero-mean and small variance.

The authors use a one-hidden-layer MLP to predict these deltas from a question embedding $\mathbf{e\_{q}}$ for all feature maps within the layer:

$$\Delta\beta = \text{MLP}\left(\mathbf{e\_{q}}\right)$$

$$\Delta\gamma = \text{MLP}\left(\mathbf{e\_{q}}\right)$$

So, given a feature map with $C$ channels, these MLPs output a vector of size $C$. We then add these predictions to the $\beta$ and $\gamma$ parameters:

$$ \hat{\beta}\_{c} = \beta\_{c} + \Delta\beta\_{c} $$

$$ \hat{\gamma}\_{c} = \gamma\_{c} + \Delta\gamma\_{c} $$

Finally, these updated $\hat{β}$ and $\hat{\gamma}$ are used as parameters for the batch normalization: $\text{BN}\left(F\_{i,c,h,w}|\hat{\gamma\_{c}}, \hat{\beta\_{c}}\right)$. The authors freeze all ResNet parameters, including $\gamma$ and $\beta$, during training. A ResNet consists of
four stages of computation, each subdivided in several residual blocks. In each block, the authors apply CBN to the three convolutional layers.

Conditional Batch Normalization

Modulating early visual processing by language

GradientDICE

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

**Temporal Pyramid Network**, or **TPN**, is a pyramid level module for action recognition at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner. The source of features and the fusion of features form a feature hierarchy for the backbone so that it can capture action instances at various tempos. In the TPN, a Backbone Network is used to extract multiple level features, a Spatial Semantic Modulation spatially downsamples features to align semantics, a Temporal Rate Modulation temporally downsamples features to adjust relative tempo among levels, Information Flow aggregates features in various directions to enhance and enrich level-wise representations and Final Prediction rescales and concatenates all levels of pyramid along channel dimension.

Source	GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com