What is: MacBERT?
Source | Revisiting Pre-Trained Models for Chinese Natural Language Processing |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
MacBERT is a Transformer-based model for Chinese NLP that alters RoBERTa in several ways, including a modified masking strategy. Instead of masking with [MASK] token, which never appears in the fine-tuning stage, MacBERT masks the word with its similar word. Specifically MacBERT shares the same pre-training tasks as BERT with several modifications. For the MLM task, the following modifications are performed:
- Whole word masking is used as well as Ngram masking strategies for selecting candidate tokens for masking, with a percentage of 40%, 30%, 20%, 10% for word-level unigram to 4-gram.
- Instead of masking with [MASK] token, which never appears in the fine-tuning stage, similar words are used for the masking purpose. A similar word is obtained by using Synonyms toolkit which is based on word2vec similarity calculations. If an N-gram is selected to mask, we will find similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement.
- A percentage of 15% input words is used for masking, where 80% will replace with similar words, 10% replace with a random word, and keep with original words for the rest of 10%.