Viet-Anh on Software Logo

What is: TILDEv2?

SourceFast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TILDEv2 is a BERT-based re-ranking method that stems from TILDE but that addresses its limitations. It relies on contextualized exact term matching with expanded passages. This requires to only store in the index the score of tokens that appear in the expanded passages (rather than all the vocabulary), thus producing indexes that are 99% smaller than those of the original.

Specifically, TILDE is modified in the following aspects:

  • Exact Term Matching. The query likelihood matching originally employed in TILDE, expands passages into the BERT vocabulary size, resulting in large indexes. To overcome this issue, estimating relevance scores is achieved with contextualized exact term matching. This allows the model to index tokens only present in the passage, thus reducing the index size. In addition to this, we replace the query likelihood loss function, with the Noise contrastive estimation (NCE) loss that allows to better leverage negative training samples.

  • Passage Expansion. To overcome the vocabulary mismatch problem that affects exact term matching methods, passage expansion is used to expand the original passage collection. Passages in the collection are expanded using deep LMs with a limited number of tokens. This requires TILDEv2 to only index a few extra tokens in addition to those in the original passages.