**RoBERTa** is an extension of [BERT](https://paperswithcode.com/method/bert) with changes to the pretraining procedure. The modifications include: 

- training the model longer, with bigger batches, over more data
- removing the next sentence prediction objective
- training on longer sequences
- dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects

The ARMA GNN layer implements a rational graph filter with a recursive approximation.

ARMA

Graph Neural Networks with convolutional ARMA filters

RoBERTa

RoBERTa: A Robustly Optimized BERT Pretraining Approach

**Corner Pooling** is a pooling technique for object detection that seeks to better localize corners by encoding explicit prior knowledge. Suppose we want to determine if a pixel at location $\left(i, j\right)$ is a top-left corner. Let $f\_{t}$ and $f\_{l}$ be the feature maps that are the inputs to the top-left corner pooling layer, and let $f\_{t\_{ij}}$ and $f\_{l\_{ij}}$ be the vectors at location $\left(i, j\right)$ in $f\_{t}$ and $f\_{l}$ respectively. With $H \times W$ feature maps, the corner pooling layer first max-pools all feature vectors between $\left(i, j\right)$ and $\left(i, H\right)$ in $f\_{t}$ to a feature vector $t\_{ij}$ , and max-pools all feature vectors between $\left(i, j\right)$ and $\left(W, j\right)$ in $f\_{l}$ to a feature vector $l\_{ij}$. Finally, it adds $t\_{ij}$ and $l\_{ij}$ together.

Source	RoBERTa: A Robustly Optimized BERT Pretraining Approach
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: RoBERTa?

Viet-Anh on Software