AICurious Logo

What is: GPT-3?

SourceLanguage Models are Few-Shot Learners
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT-3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.