What is: Adaptive Softmax?
Source | Efficient softmax approximation for GPUs |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Adaptive Softmax is a speedup technique for the computation of probability distributions over words. The adaptive softmax is inspired by the class-based hierarchical softmax, where the word classes are built to minimize the computation time. Adaptive softmax achieves efficiency by explicitly taking into account the computation time of matrix-multiplication on parallel systems and combining it with a few important observations, namely keeping a shortlist of frequent words in the root node and reducing the capacity of rare words.