AICurious Logo

What is: Gaussian Gated Linear Network?

SourceGaussian Gated Linear Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Gaussian Gated Linear Network, or G-GLN, is a multi-variate extension to the recently proposed GLN family of deep neural networks by reformulating the GLN neuron as a gated product of Gaussians. This Gaussian Gated Linear Network (G-GLN) formulation exploits the fact that exponential family densities are closed under multiplication, a property that has seen much use in Gaussian Process and related literature. Similar to the Bernoulli GLN, every neuron in the G-GLN directly predicts the target distribution.

Precisely, a G-GLN is a feed-forward network of data-dependent distributions. Each neuron calculates the sufficient statistics (μ,σ_2)\left(\mu, \sigma\_{2}\right) for its associated PDF using its active weights, given those emitted by neurons in the preceding layer. It consists of consists of L+1L+1 layers indexed by i{0,,L}i \in\{0, \ldots, L\} with K_iK\_{i} neurons in each layer. The weight space for a neuron in layer ii is denoted by W_i\mathcal{W}\_{i}; the subscript is needed since the dimension of the weight space depends on Ki1K_{i-1}. Each neuron/distribution is indexed by its position in the network when laid out on a grid; for example, f_ikf\_{i k} refers to the family of PDFs defined by the kk th neuron in the ii th layer. Similarly, c_ikc\_{i k} refers to the context function associated with each neuron in layers i1i \geq 1, and μ_ik\mu\_{i k} and σ_ik2\sigma\_{i k}^{2} (or Σ_ik\Sigma\_{i k} in the multivariate case) referring to the sufficient statistics for each Gaussian PDF.

There are two types of input to neurons in the network. The first is the side information, which can be thought of as the input features, and is used to determine the weights used by each neuron via half-space gating. The second is the input to the neuron, which is the PDFs output by the previous layer, or in the case of layer 0, some provided base models. To apply a G-GLN in a supervised learning setting, we need to map the sequence of input-label pairs (x_t,y_t)\left(x\_{t}, y\_{t}\right) for t=1,2,t=1,2, \ldots onto a sequence of (side information, base Gaussian PDFs, label) triplets \left(z\_{t},\left\(f\_{0 i}\right\)\_{i}, y\_{t}\right). The side information z_tz\_{t} is set to the (potentially normalized) input features x_tx\_{t}. The Gaussian PDFs for layer 0 will generally include the necessary base Gaussian PDFs to span the target range, and optionally some base prediction PDFs that capture domain-specific knowledge.