What is: Ternary Weight Splitting?
Source | BinaryBERT: Pushing the Limit of BERT Quantization |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight and quantized to their binary counterparts and via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):
While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting to satisfy . See the paper for more details.