**Jukebox** is a model that generates music with singing in the raw audio domain. It tackles the long context of raw audio using a multi-scale [VQ-VAE](https://paperswithcode.com/method/vq-vae) to compress it to discrete codes, and modeling those using [autoregressive Transformers](https://paperswithcode.com/methods/category/autoregressive-transformers). It can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.

Three separate VQ-VAE models are trained with different temporal resolutions. At each level, the input audio is segmented and encoded into latent vectors $\mathbf{h}\_{t}$, which are then quantized to the closest codebook vectors $\mathbf{e}\_{z\_{t}}$. The code $z\_{t}$ is a discrete representation of the audio that we later train our prior on. The decoder takes the sequence of codebook vectors and reconstructs the audio. The top level learns the highest degree of abstraction, since it is encoding longer audio per token while keeping the codebook size the same. Audio can be reconstructed using the codes at any one of the abstraction levels, where the least abstract bottom-level codes result in the highest-quality audio.

AlphaFold is a deep learning based algorithm for accurate protein structure prediction. AlphaFold incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

Description from: [Highly accurate protein structure prediction with AlphaFold](https://paperswithcode.com/paper/highly-accurate-protein-structure-prediction)

Image credit: [DeepMind](https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology)

AlphaFold

Highly accurate protein structure prediction with AlphaFold

Jukebox

Jukebox: A Generative Model for Music

**IMPALA**, or the **Importance Weighted Actor Learner Architecture**, is an off-policy actor-critic framework that decouples acting from learning and learns from experience trajectories using [V-trace](https://paperswithcode.com/method/v-trace). Unlike the popular [A3C](https://paperswithcode.com/method/a3c)-based agents, in which workers communicate gradients with respect to the parameters of the policy to a central parameter server, IMPALA actors communicate trajectories of experience (sequences of states, actions, and rewards) to a centralized learner. Since the learner in IMPALA has access to full trajectories of experience we use a GPU to perform updates on mini-batches of trajectories while aggressively parallelising all time independent operations. 

This type of decoupled architecture can achieve very high throughput. However, because the policy used to generate a trajectory can lag behind the policy on the learner by several updates at the time of gradient calculation, learning becomes off-policy. The V-trace off-policy actor-critic algorithm is used to correct for this harmful discrepancy.

Source	Jukebox: A Generative Model for Music
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com