Viet-Anh on Software Logo

What is: Beneš Block with Residual Switch Units?

SourceResidual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Beneš block is a computation-efficient alternative to dense attention, enabling the modelling of long-range dependencies in O(n log n) time. In comparison, dense attention which is commonly used in Transformers has O(n^2) complexity.

In music, dependencies occur on several scales, including on a coarse scale which requires processing very long sequences. Beneš blocks have been used in Residual Shuffle-Exchange Networks to achieve state-of-the-art results in music transcription.

Beneš blocks have a ‘receptive field’ of the size of the whole sequence, and it has no bottleneck. These properties hold for dense attention but have not been shown for many sparse attention and dilated convolutional architectures.