**GBST**, or **Gradient-based Subword Tokenization Module**, is a soft gradient-based subword tokenization module that automatically learns latent subword representations from characters in a data-driven fashion. Concretely, GBST enumerates candidate subword blocks and learns to score them in a position-wise fashion using a block scoring network.  

GBST learns a position-wise soft selection over candidate subword blocks by scoring them with a scoring network. In contrast to prior tokenization-free methods, GBST learns interpretable latent subwords, which enables easy inspection of lexical representations and is more efficient than other byte-based models.

**Anti-Alias Downsampling (AA)** aims to improve the shift-equivariance of deep networks. Max-pooling is inherently composed of two operations. The first operation is to densely evaluate the max operator and second operation is naive subsampling. AA is proposed as a low-pass filter between them to achieve practical anti-aliasing in any existing strided layer such as strided [convolution](https://paperswithcode.com/method/convolution). The smoothing factor can be adjusted by changing the blur kernel filter size, where a larger filter size results in increased blur.

Anti-Alias Downsampling

Making Convolutional Networks Shift-Invariant Again

Gradient-Based Subword Tokenization

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

Spatial CNN with UNet based Encoder-decoder and ConvLSTM

Source	Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: GBST?

Viet-Anh on Software