Viet-Anh on Software Logo

What is: Mirror Descent Policy Optimization?

SourceMirror Descent Policy Optimization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Mirror Descent Policy Optimization (MDPO) is a policy gradient algorithm based on the idea of iteratively solving a trust-region problem that minimizes a sum of two terms: a linearization of the standard RL objective function and a proximity term that restricts two consecutive updates to be close to each other. It is based on Mirror Descent, which is a general trust region method that attempts to keep consecutive iterates close to each other.