What is: GPT-NeoX?
Source | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.