AICurious Logo

What is: Multi-band MelGAN?

SourceMulti-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Multi-band MelGAN, or MB-MelGAN, is a waveform generation model focusing on high-quality text-to-speech. It improves the original MelGAN in several ways. First, it increases the receptive field of the generator, which is proven to be beneficial to speech generation. Second, it substitutes the feature matching loss with the multi-resolution STFT loss to better measure the difference between fake and real speech. Lastly, MelGAN is extended with multi-band processing: the generator takes mel-spectrograms as input and produces sub-band signals which are subsequently summed back to full-band signals as discriminator input.