AICurious Logo

What is: Hierarchical Softmax?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Hierarchical Softmax is a is an alternative to softmax that is faster to evaluate: it is O(logn)O\left(\log{n}\right) time to evaluate compared to O(n)O\left(n\right) for softmax. It utilises a multi-layer binary tree, where the probability of a word is calculated through the product of probabilities on each edge on the path to that node. See the Figure to the right for an example of where the product calculation would occur for the word "I'm".

(Introduced by Morin and Bengio)

Image Credit: Steven Schmatz