Viet-Anh on Software Logo

What is: Twin Delayed Deep Deterministic?

SourceAddressing Function Approximation Error in Actor-Critic Methods
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing (which is similar to a SARSA based update; a safer update, as they provide higher value to actions resistant to perturbations).