What is: Mirror Descent Policy Optimization?
| Source | Mirror Descent Policy Optimization | 
| Year | 2000 | 
| Data Source | CC BY-SA - https://paperswithcode.com | 
Mirror Descent Policy Optimization (MDPO) is a policy gradient algorithm based on the idea of iteratively solving a trust-region problem that minimizes a sum of two terms: a linearization of the standard RL objective function and a proximity term that restricts two consecutive updates to be close to each other. It is based on Mirror Descent, which is a general trust region method that attempts to keep consecutive iterates close to each other.
