What is: Neighborhood Attention?
Source | Neighborhood Attention Transformer |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Neighborhood Attention is a restricted self attention pattern in which each token's receptive field is limited to its nearest neighboring pixels. It was proposed in Neighborhood Attention Transformer as an alternative to other local attention mechanisms used in Hierarchical Vision Transformers.
NA is in concept similar to stand alone self attention (SASA), in that both can be implemented with a raster scan sliding window operation over the key value pair. However, NA would require a modification to handle corner pixels, which helps maintain a fixed receptive field size and an increased number of relative positions.
The primary challenge in experimenting with both NA and SASA has been computation. Simply extracting key values for each query is slow, takes up a large amount of memory, and is eventually intractable at scale. NA was therefore implemented through a new CUDA extension to PyTorch, NATTEN.