Viet-Anh on Software Logo

What is: GradientDICE?

SourceGradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

GradientDICE is a density ratio learning method for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. It optimizes a different objective from GenDICE by using the Perron-Frobenius theorem and eliminating GenDICE’s use of divergence, such that nonlinearity in parameterization is not necessary for GradientDICE, which is provably convergent under linear function approximation.