AICurious Logo

What is: Gated Linear Unit?

SourceLanguage Modeling with Gated Convolutional Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Gated Linear Unit, or GLU computes:

GLU(a,b)=aσ(b)\text{GLU}\left(a, b\right) = a\otimes \sigma\left(b\right)

It is used in natural language processing architectures, for example the Gated CNN, because here bb is the gate that control what information from aa is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.