What is: Galactica?
Source | Galactica: A Large Language Model for Science |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:
- It uses GeLU activations on all model sizes
- It uses a 2048 length context window for all model sizes
- It does not use biases in any of the dense kernels or layer norms
- It uses learned positional embeddings for the model
- A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data