What is: lda2vec?

lda2vec builds representations over both words and documents by mixing word2vec’s skipgram architecture with Dirichlet-optimized sparse topic mixtures.

The Skipgram Negative-Sampling (SGNS) objective of word2vec is modified to utilize document-wide feature vectors while simultaneously learning continuous document weights loading onto topic vectors. The total loss term $L$ is the sum of the Skipgram Negative Sampling Loss (SGNS) $L^{neg}\_{ij}$ with the addition of a Dirichlet-likelihood term over document weights, $L\_{d}$ . The loss is conducted using a context vector, $\overrightarrow{c\_{j}}$ , pivot word vector $\overrightarrow{w\_{j}}$ , target word vector $\overrightarrow{w\_{i}}$ , and negatively-sampled word vector $\overrightarrow{w\_{l}}$ :

$L = L^{d} + \Sigma\_{ij}L^{neg}\_{ij}$

$L^{neg}\_{ij} = \log\sigma\left(c\_{j}\cdot\overrightarrow{w\_{i}}\right) + \sum^{n}\_{l=0}\sigma\left(-\overrightarrow{c\_{j}}\cdot\overrightarrow{w\_{l}}\right)$

Source	Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: lda2vec?

Viet-Anh on Software