AICurious Logo

What is: Chinchilla?

SourceTraining Compute-Optimal Large Language Models
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Chinchilla is a 70B parameters model trained as a compute-optimal model with 1.4 trillion tokens. Findings suggest that these types of models are trained optimally by equally scaling both model size and training tokens. It uses the same compute budget as Gopher but with 4x more training data. Chinchilla and Gopher are trained for the same number of FLOPs. It is trained using MassiveText using a slightly modified SentencePiece tokenizer. More architectural details in the paper.