**IMPALA**, or the **Importance Weighted Actor Learner Architecture**, is an off-policy actor-critic framework that decouples acting from learning and learns from experience trajectories using [V-trace](https://paperswithcode.com/method/v-trace). Unlike the popular [A3C](https://paperswithcode.com/method/a3c)-based agents, in which workers communicate gradients with respect to the parameters of the policy to a central parameter server, IMPALA actors communicate trajectories of experience (sequences of states, actions, and rewards) to a centralized learner. Since the learner in IMPALA has access to full trajectories of experience we use a GPU to perform updates on mini-batches of trajectories while aggressively parallelising all time independent operations. 

This type of decoupled architecture can achieve very high throughput. However, because the policy used to generate a trajectory can lag behind the policy on the learner by several updates at the time of gradient calculation, learning becomes off-policy. The V-trace off-policy actor-critic algorithm is used to correct for this harmful discrepancy.

**Jukebox** is a model that generates music with singing in the raw audio domain. It tackles the long context of raw audio using a multi-scale [VQ-VAE](https://paperswithcode.com/method/vq-vae) to compress it to discrete codes, and modeling those using [autoregressive Transformers](https://paperswithcode.com/methods/category/autoregressive-transformers). It can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.

Three separate VQ-VAE models are trained with different temporal resolutions. At each level, the input audio is segmented and encoded into latent vectors $\mathbf{h}\_{t}$, which are then quantized to the closest codebook vectors $\mathbf{e}\_{z\_{t}}$. The code $z\_{t}$ is a discrete representation of the audio that we later train our prior on. The decoder takes the sequence of codebook vectors and reconstructs the audio. The top level learns the highest degree of abstraction, since it is encoding longer audio per token while keeping the codebook size the same. Audio can be reconstructed using the codes at any one of the abstraction levels, where the least abstract bottom-level codes result in the highest-quality audio.

Jukebox

Jukebox: A Generative Model for Music

IMPALA

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

learn to explore the sample relationships via transformer networks

Source	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com