What is: Beneš Block with Residual Switch Units?
Source | Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Beneš block is a computation-efficient alternative to dense attention, enabling the modelling of long-range dependencies in O(n log n) time. In comparison, dense attention which is commonly used in Transformers has O(n^2) complexity.
In music, dependencies occur on several scales, including on a coarse scale which requires processing very long sequences. Beneš blocks have been used in Residual Shuffle-Exchange Networks to achieve state-of-the-art results in music transcription.
Beneš blocks have a ‘receptive field’ of the size of the whole sequence, and it has no bottleneck. These properties hold for dense attention but have not been shown for many sparse attention and dilated convolutional architectures.