What is: DistilBERT?
| Source | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | 
| Year | 2000 | 
| Data Source | CC BY-SA - https://paperswithcode.com | 
DistilBERT is a small, fast, cheap and light Transformer model based on the BERT architecture. Knowledge distillation is performed during the pre-training phase to reduce the size of a BERT model by 40%. To leverage the inductive biases learned by larger models during pre-training, the authors introduce a triple loss combining language modeling, distillation and cosine-distance losses.
