AICurious Logo

What is: CPM-2?

SourceCPM-2: Large-scale Cost-effective Pre-trained Language Models
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

CPM-2 is a 11 billion parameters pre-trained language model based on a standard Transformer architecture consisting of a bidirectional encoder and a unidirectional decoder. The model is pre-trained on WuDaoCorpus which contains 2.3TB cleaned Chinese data as well as 300GB cleaned English data. The pre-training process of CPM-2 can be divided into three stages: Chinese pre-training, bilingual pre-training, and MoE pre-training. Multi-stage training with knowledge inheritance can significantly reduce the computation cost.