What is: Mirror Descent Policy Optimization?
Source | Mirror Descent Policy Optimization |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Mirror Descent Policy Optimization (MDPO) is a policy gradient algorithm based on the idea of iteratively solving a trust-region problem that minimizes a sum of two terms: a linearization of the standard RL objective function and a proximity term that restricts two consecutive updates to be close to each other. It is based on Mirror Descent, which is a general trust region method that attempts to keep consecutive iterates close to each other.