**Mirror Descent Policy Optimization (MDPO)** is a policy gradient algorithm based on the idea of iteratively solving a trust-region problem that minimizes a sum of two terms: a linearization of the standard RL objective function and a proximity term that restricts two consecutive updates to be close to each other. It is based on Mirror Descent, which is a general trust region method that
attempts to keep consecutive iterates close to each other.

**Attention Mesh** is a neural network architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Specifically region-specific heads are employed that transform the feature maps with spatial transformers.

Attention Mesh

Attention Mesh: High-fidelity Face Mesh Prediction in Real-time

MDPO

Mirror Descent Policy Optimization

**Gumbel-Softmax** is a continuous distribution that has the property that it can be smoothly annealed into a categorical distribution, and whose parameter gradients can be easily computed via the reparameterization trick.

Source	Mirror Descent Policy Optimization
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Mirror Descent Policy Optimization?

Viet-Anh on Software