Viet-Anh on Software Logo

What is: lda2vec?

SourceMixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

lda2vec builds representations over both words and documents by mixing word2vec’s skipgram architecture with Dirichlet-optimized sparse topic mixtures.

The Skipgram Negative-Sampling (SGNS) objective of word2vec is modified to utilize document-wide feature vectors while simultaneously learning continuous document weights loading onto topic vectors. The total loss term LL is the sum of the Skipgram Negative Sampling Loss (SGNS) Lneg_ijL^{neg}\_{ij} with the addition of a Dirichlet-likelihood term over document weights, L_dL\_{d}. The loss is conducted using a context vector, c_j\overrightarrow{c\_{j}} , pivot word vector w_j\overrightarrow{w\_{j}}, target word vector w_i\overrightarrow{w\_{i}}, and negatively-sampled word vector w_l\overrightarrow{w\_{l}}:

L=Ld+Σ_ijLneg_ijL = L^{d} + \Sigma\_{ij}L^{neg}\_{ij}

Lneg_ij=logσ(c_jw_i)+n_l=0σ(c_jw_l)L^{neg}\_{ij} = \log\sigma\left(c\_{j}\cdot\overrightarrow{w\_{i}}\right) + \sum^{n}\_{l=0}\sigma\left(-\overrightarrow{c\_{j}}\cdot\overrightarrow{w\_{l}}\right)