What is: Electric?
Source | Pre-Training Transformers as Energy-Based Cloze Models |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Electric is an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context.
Specifically, like BERT, Electric also models , but does not use masking or a softmax layer. Electric first maps the unmasked input into contextualized vector representations using a transformer network. The model assigns a given position an energy score
using a learned weight vector . The energy function defines a distribution over the possible tokens at position as
where denotes replacing the token at position with and is the vocabulary, in practice usually word pieces. Unlike with BERT, which produces the probabilities for all possible tokens using a softmax layer, a candidate is passed in as input to the transformer. As a result, computing is prohibitively expensive because the partition function requires running the transformer times; unlike most EBMs, the intractability of is more due to the expensive scoring function rather than having a large sample space.