Viet-Anh on Software Logo

What is: TernaryBERT?

SourceTernaryBERT: Distillation-aware Ultra-low Bit BERT
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TernaryBERT is a Transformer-based model which ternarizes the weights of a pretrained BERT model to {1,0,+1}\{-1,0,+1\}, with different granularities for word embedding and weights in the Transformer layer. Instead of directly using knowledge distillation to compress a model, it is used to improve the performance of ternarized student model with the same size as the teacher model. In this way, we transfer the knowledge from the highly-accurate teacher model to the ternarized student model with smaller capacity.