Viet-Anh on Software Logo

What is: IMPALA?

SourceIMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

IMPALA, or the Importance Weighted Actor Learner Architecture, is an off-policy actor-critic framework that decouples acting from learning and learns from experience trajectories using V-trace. Unlike the popular A3C-based agents, in which workers communicate gradients with respect to the parameters of the policy to a central parameter server, IMPALA actors communicate trajectories of experience (sequences of states, actions, and rewards) to a centralized learner. Since the learner in IMPALA has access to full trajectories of experience we use a GPU to perform updates on mini-batches of trajectories while aggressively parallelising all time independent operations.

This type of decoupled architecture can achieve very high throughput. However, because the policy used to generate a trajectory can lag behind the policy on the learner by several updates at the time of gradient calculation, learning becomes off-policy. The V-trace off-policy actor-critic algorithm is used to correct for this harmful discrepancy.