What is: PAR Transformer?
Source | Pay Attention when Required |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
PAR Transformer is a Transformer model that uses 63% fewer self-attention blocks, replacing them with feed-forward blocks, while retaining test accuracies. It is based on the Transformer-XL architecture and uses neural architecture search to find an an efficient pattern of blocks in the transformer architecture.