What is: SepFormer?
Source | Attention is All You Need in Speech Separation |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
SepFormer is Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. It is mainly composed of multi-head attention and feed-forward layers. A dual-path framework (introduced by DPRNN) is adopted and RNNs are replaced with a multiscale pipeline composed of transformers that learn both short and long-term dependencies. The dual-path framework enables the mitigation of the quadratic complexity of transformers, as transformers in the dual-path framework process smaller chunks.
The model is based on the learned-domain masking approach and employs an encoder, a decoder, and a masking network, as shown in the figure. The encoder is fully convolutional, while the decoder employs two Transformers embedded inside the dual-path processing block. The decoder finally reconstructs the separated signals in the time domain by using the masks predicted by the masking network.