What is: SpecGAN?
Source | Adversarial Audio Synthesis |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
SpecGAN is a generative adversarial network method for spectrogram-based, frequency-domain audio generation. The problem is suited for GANs designed for image generation. The model can be approximately inverted.
To process audio into suitable spectrograms, the authors perform the short-time Fourier transform with 16 ms windows and 8ms stride, resulting in 128 frequency bins, linearly spaced from 0 to 8 kHz. They take the magnitude of the resultant spectra and scale amplitude values logarithmically to better-align with human perception. They then normalize each frequency bin to have zero mean and unit variance. They clip the spectra to standard deviations and rescale to .
They then use the DCGAN approach on the result spectra.