Viet-Anh on Software Logo

What is: Self-Adversarial Negative Sampling?

SourceRotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Self-Adversarial Negative Sampling is a negative sampling technique used for methods like word embeddings and knowledge graph embeddings. The traditional negative sampling loss from word2vec for optimizing distance-based models be written as:

L=logσ(γd_r(h,t))n_i=11klogσ(d_r(h_i,t_i)γ)L = −\log\sigma\left(\gamma − d\_{r}\left(\mathbf{h}, \mathbf{t}\right)\right) − \sum^{n}\_{i=1}\frac{1}{k}\log\sigma\left(d\_{r}\left(\mathbf{h}^{'}\_{i}, \mathbf{t}^{'}\_{i}\right) - \gamma\right)

where γ\gamma is a fixed margin, σ\sigma is the sigmoid function, and (h_i,r,t_i)\left(\mathbf{h}^{'}\_{i}, r, \mathbf{t}^{'}\_{i}\right) is the ii-th negative triplet.

The negative sampling loss samples the negative triplets in a uniform way. Such a uniform negative sampling suffers the problem of inefficiency since many samples are obviously false as training goes on, which does not provide any meaningful information. Therefore, the authors propose an approach called self-adversarial negative sampling, which samples negative triples according to the current embedding model. Specifically, we sample negative triples from the following distribution:

p(h_j,r,t_jset(h_i,r_i,t_i))=expαf_r(h_j,t_j)_i=1expαf_r(h_i,t_i)p\left(h^{'}\_{j}, r, t^{'}\_{j} | \text{set}\left(h\_{i}, r\_{i}, t\_{i} \right) \right) = \frac{\exp\alpha{f}\_{r}\left(\mathbf{h}^{'}\_{j}, \mathbf{t}^{'}\_{j}\right)}{\sum\_{i=1}\exp\alpha{f}\_{r}\left(\mathbf{h}^{'}\_{i}, \mathbf{t}^{'}\_{i}\right)}

where α\alpha is the temperature of sampling. Moreover, since the sampling procedure may be costly, the authors treat the above probability as the weight of the negative sample. Therefore, the final negative sampling loss with self-adversarial training takes the following form:

L=logσ(γd_r(h,t))n_i=1p(h_i,r,t_i)logσ(d_r(h_i,t_i)γ)L = −\log\sigma\left(\gamma − d\_{r}\left(\mathbf{h}, \mathbf{t}\right)\right) − \sum^{n}\_{i=1}p\left(h^{'}\_{i}, r, t^{'}\_{i}\right)\log\sigma\left(d\_{r}\left(\mathbf{h}^{'}\_{i}, \mathbf{t}^{'}\_{i}\right) - \gamma\right)