What is: ParaNet?
Source | Non-Autoregressive Neural Text-to-Speech |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
ParaNet is a non-autoregressive attention-based architecture for text-to-speech, which is fully convolutional and converts text to mel spectrogram. ParaNet distills the attention from the autoregressive text-to-spectrogram model, and iteratively refines the alignment between text and spectrogram in a layer-by-layer manner. The architecture is otherwise similar to Deep Voice 3 except these changes to the decoder; whereas the decoder of DV3 has multiple attention-based layers, where each layer consists of a causal convolution block followed by an attention block, ParaNet has a single attention block in the encoder.