AICurious Logo

What is: SpecGAN?

SourceAdversarial Audio Synthesis
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

SpecGAN is a generative adversarial network method for spectrogram-based, frequency-domain audio generation. The problem is suited for GANs designed for image generation. The model can be approximately inverted.

To process audio into suitable spectrograms, the authors perform the short-time Fourier transform with 16 ms windows and 8ms stride, resulting in 128 frequency bins, linearly spaced from 0 to 8 kHz. They take the magnitude of the resultant spectra and scale amplitude values logarithmically to better-align with human perception. They then normalize each frequency bin to have zero mean and unit variance. They clip the spectra to 33 standard deviations and rescale to [1,1]\left[−1, 1\right].

They then use the DCGAN approach on the result spectra.