Viet-Anh on Software Logo

What is: Galactica?

SourceGalactica: A Large Language Model for Science
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:

  • It uses GeLU activations on all model sizes
  • It uses a 2048 length context window for all model sizes
  • It does not use biases in any of the dense kernels or layer norms
  • It uses learned positional embeddings for the model
  • A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data