Viet-Anh on Software Logo

What is: REINFORCE?

Year1999
Data SourceCC BY-SA - https://paperswithcode.com

REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter θ\theta. Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.

_θJ(θ)=E_π[G_t_θlnπ_θ(A_tS_t)] \nabla\_{\theta}J\left(\theta\right) = \mathbb{E}\_{\pi}\left[G\_{t}\nabla\_{\theta}\ln\pi\_{\theta}\left(A\_{t}\mid{S\_{t}}\right)\right]

Image Credit: Tingwu Wang