**WaveVAE** is a generative audio model that can be used as a vocoder in text-to-speech systems. It is a [VAE](https://paperswithcode.com/method/vae) based model that can be trained from scratch by jointly optimizing the encoder $q\_{\phi}\left(\mathbf{z}|\mathbf{x}, \mathbf{c}\right)$ and decoder $p\_{\theta}\left(\mathbf{x}|\mathbf{z}, \mathbf{c}\right)$, where $\mathbf{z}$ is latent variables and $\mathbf{c}$ is the mel spectrogram conditioner. 

The encoder of WaveVAE $q\_{\phi}\left(\mathbf{z}|\mathbf{x}\right)$ is parameterized by a Gaussian autoregressive [WaveNet](https://paperswithcode.com/method/wavenet) that maps the ground truth audio x into the same length latent representation $\mathbf{z}$. The decoder $p\_{\theta}\left(\mathbf{x}|\mathbf{z}\right)$ is parameterized by the one-step ahead predictions from an inverse autoregressive flow.

The training objective is the ELBO for the observed $\mathbf{x}$ in the VAE.

Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the
performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement
Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images.
To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We
demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an
enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017 and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems.

DE-GAN

WaveVAE

Non-Autoregressive Neural Text-to-Speech

**Fawkes** is an image cloaking system that helps individuals inoculate their images against unauthorized facial recognition models. Fawkes achieves this by helping users add imperceptible pixel-level changes ("cloaks") to their own photos before releasing them. When used to train facial recognition models, these "cloaked" images produce functional models that consistently cause normal images of the user to be misidentified.

Source	Non-Autoregressive Neural Text-to-Speech
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com