Viet-Anh on Software Logo

What is: TaBERT?

SourceTaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TaBERT is a pretrained language model (LM) that jointly learns representations for natural language sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts.

In summary, TaBERT's process for learning representations for NL sentences is as follows: Given an utterance uu and a table TT, TaBERT first creates a content snapshot of TT. This snapshot consists of sampled rows that summarize the information in TT most relevant to the input utterance. The model then linearizes each row in the snapshot, concatenates each linearized row with the utterance, and uses the concatenated string as input to a Transformer model, which outputs row-wise encoding vectors of utterance tokens and cells. The encodings for all the rows in the snapshot are fed into a series of vertical self-attention layers, where a cell representation (or an utterance token representation) is computed by attending to vertically-aligned vectors of the same column (or the same NL token). Finally, representations for each utterance token and column are generated from a pooling layer.