Viet-Anh on Software Logo

What is: Deterministic Policy Gradient?

Year2014
Data SourceCC BY-SA - https://paperswithcode.com

Deterministic Policy Gradient, or DPG, is a policy gradient method for reinforcement learning. Instead of the policy function π(.s)\pi\left(.\mid{s}\right) being modeled as a probability distribution, DPG considers and calculates gradients for a deterministic policy a=μ_theta(s)a = \mu\_{theta}\left(s\right).