AICurious Logo

What is: ParaNet?

SourceNon-Autoregressive Neural Text-to-Speech
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

ParaNet is a non-autoregressive attention-based architecture for text-to-speech, which is fully convolutional and converts text to mel spectrogram. ParaNet distills the attention from the autoregressive text-to-spectrogram model, and iteratively refines the alignment between text and spectrogram in a layer-by-layer manner. The architecture is otherwise similar to Deep Voice 3 except these changes to the decoder; whereas the decoder of DV3 has multiple attention-based layers, where each layer consists of a causal convolution block followed by an attention block, ParaNet has a single attention block in the encoder.