AICurious Logo

What is: GPT-NeoX?

SourceGPT-NeoX-20B: An Open-Source Autoregressive Language Model
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.