What is: CANINE?
Source | CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
CANINE is a pre-trained encoder for language understanding that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy with soft inductive biases in place of hard token boundaries. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context.