What is: Tacotron?
Source | Tacotron: Towards End-to-End Speech Synthesis |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Tacotron is an end-to-end generative text-to-speech model that takes a character sequence as input and outputs the corresponding spectrogram. The backbone of Tacotron is a seq2seq model with attention. The Figure depicts the model, which includes an encoder, an attention-based decoder, and a post-processing net. At a high-level, the model takes characters as input and produces spectrogram frames, which are then converted to waveforms.