**TD3** builds on the [DDPG](https://paperswithcode.com/method/ddpg) algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises [clipped double Q-learning](https://paperswithcode.com/method/clipped-double-q-learning), delayed update of target and policy networks, and [target policy smoothing](https://paperswithcode.com/method/target-policy-smoothing) (which is similar to a [SARSA](https://paperswithcode.com/method/sarsa) based update; a safer update, as they provide higher value to actions resistant to perturbations).

**FastMoE ** is a distributed MoE training system based on PyTorch with common accelerators. The system provides a hierarchical interface for both flexible model design and adaption to different applications, such as [Transformer-XL](https://paperswithcode.com/method/transformer-xl) and Megatron-LM.

FastMoE

FastMoE: A Fast Mixture-of-Expert Training System

Addressing Function Approximation Error in Actor-Critic Methods

**Neural Tangent Transfer**, or **NTT**, is a method for finding trainable sparse networks in a label-free manner. Specifically, NTT finds sparse networks whose training dynamics, as characterized by the neural tangent kernel, mimic those of dense networks in function space.

Source	Addressing Function Approximation Error in Actor-Critic Methods
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Twin Delayed Deep Deterministic?

Viet-Anh on Software