What is: WaveVAE?
Source | Non-Autoregressive Neural Text-to-Speech |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
WaveVAE is a generative audio model that can be used as a vocoder in text-to-speech systems. It is a VAE based model that can be trained from scratch by jointly optimizing the encoder and decoder , where is latent variables and is the mel spectrogram conditioner.
The encoder of WaveVAE is parameterized by a Gaussian autoregressive WaveNet that maps the ground truth audio x into the same length latent representation . The decoder is parameterized by the one-step ahead predictions from an inverse autoregressive flow.
The training objective is the ELBO for the observed in the VAE.