**PAR Transformer** is a [Transformer](https://paperswithcode.com/methods/category/transformers) model that uses 63% fewer [self-attention blocks](https://paperswithcode.com/method/scaled), replacing them with [feed-forward blocks](https://paperswithcode.com/method/position-wise-feed-forward-layer), while retaining test accuracies. It is based on the [Transformer-XL](https://paperswithcode.com/method/transformer-xl) architecture and uses [neural architecture search](https://paperswithcode.com/task/architecture-search) to find an an efficient pattern of blocks in the transformer architecture.

DecomCAM decomposes intermediate activation maps into orthogonal features using singular value decomposition and generates saliency maps by integrating them.

DecomCAM

Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map

PAR Transformer

Pay Attention when Required

**Augmented SBERT** is a data augmentation strategy for pairwise sentence scoring that uses a [BERT](https://paperswithcode.com/method/bert) cross-encoder to improve the performance for the [SBERT](https://paperswithcode.com/method/sbert) bi-encoders. Given a pre-trained, well-performing crossencoder, we sample sentence pairs according to a certain sampling strategy and label these using the cross-encoder. We call these weakly labeled examples the silver dataset and they will be merged with the gold training dataset. We then train the bi-encoder on this extended training dataset.

Source	Pay Attention when Required
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: PAR Transformer?

Viet-Anh on Software