AICurious Logo

What is: Neural Cache?

SourceImproving Neural Language Models with a Continuous Cache
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Neural Cache, or a Continuous Cache, is a module for language modelling which stores previous hidden states in memory cells. They are then used as keys to retrieve their corresponding word, that is the next word. There is no transformation applied to the storage during writing and reading.

More formally it exploits the hidden representations h_th\_{t} to define a probability distribution over the words in the cache. As illustrated in the Figure, the cache stores pairs (h_i,x_i+1)\left(h\_{i}, x\_{i+1}\right) of a hidden representation, and the word which was generated based on this representation (the vector h_ih\_{i} encodes the history x_i,,x_1x\_{i}, \dots, x\_{1}). At time tt, we then define a probability distribution over words stored in the cache based on the stored hidden representations and the current one h_th\_{t} as:

p_cache(wh_1t,x_1t)t1_i=11_set(w=x_i+1)exp(θ_h>h_tTh_i)p\_{cache}\left(w | h\_{1\dots{t}}, x\_{1\dots{t}}\right) \propto \sum^{t-1}\_{i=1}\mathcal{1}\_{\text{set}\left(w=x\_{i+1}\right)} \exp\left(θ\_{h}>h\_{t}^{T}h\_{i}\right)

where the scalar θ\theta is a parameter which controls the flatness of the distribution. When θ\theta is equal to zero, the probability distribution over the history is uniform, and the model is equivalent to a unigram cache model.