What is: CPM-2?
Source | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
CPM-2 is a 11 billion parameters pre-trained language model based on a standard Transformer architecture consisting of a bidirectional encoder and a unidirectional decoder. The model is pre-trained on WuDaoCorpus which contains 2.3TB cleaned Chinese data as well as 300GB cleaned English data. The pre-training process of CPM-2 can be divided into three stages: Chinese pre-training, bilingual pre-training, and MoE pre-training. Multi-stage training with knowledge inheritance can significantly reduce the computation cost.