What is: wav2vec Unsupervised?

wav2vec-U is an unsupervised method to train speech recognition models without any labeled data. It leverages self-supervised speech representations to segment unlabeled language and learn a mapping from these representations to phonemes via adversarial training.

Specifically, we learn self-supervised representations with wav2vec 2.0 on unlabeled speech audio, then identify clusters in the representations with k-means to segment the audio data. Next, we build segment representations by mean pooling the wav2vec 2.0 representations, performing PCA and a second mean pooling step between adjacent segments. This is input to the generator which outputs a phoneme sequence that is fed to the discriminator, similar to phonemized unlabeled text to perform adversarial training.

Source	Unsupervised Speech Recognition
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com