What is: Shuffle Transformer?
Source | Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
The Shuffle Transformer Block consists of the Shuffle Multi-Head Self-Attention module (ShuffleMHSA), the Neighbor-Window Connection module (NWC), and the MLP module. To introduce cross-window connections while maintaining the efficient computation of non-overlapping windows, a strategy which alternates between WMSA and Shuffle-WMSA in consecutive Shuffle Transformer blocks is proposed. The first window-based transformer block uses regular window partition strategy and the second window-based transformer block uses window-based selfattention with spatial shuffle. Besides, the Neighbor-Window Connection moduel (NWC) is added into each block for enhancing connections among neighborhood windows. Thus the proposed shuffle transformer block could build rich cross-window connections and augments representation. Finally, the consecutive Shuffle Transformer blocks are computed as:
where , and denote the output features of the (Shuffle-)WMSA module, the Neighbor-Window Connection module and the MLP module for block , respectively; WMSA and Shuffle-WMSA denote window-based multi-head self-attention without/with spatial shuffle, respectively.