Viet-Anh on Software Logo

What is: Gated Linear Network?

SourceGated Linear Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Gated Linear Network, or GLN, is a type of backpropagation-free neural architecture. What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization.

GLNs are feedforward networks composed of many layers of gated geometric mixing neurons as shown in the Figure . Each neuron in a given layer outputs a gated geometric mixture of the predictions from the previous layer, with the final layer consisting of just a single neuron. In a supervised learning setting, a GLN\mathrm{GLN} is trained on (side information, base predictions, label) triplets (z_t,p_t,x_t)t=1,2,3,\left(z\_{t}, p\_{t}, x\_{t}\right)_{t=1,2,3, \ldots} derived from input-label pairs (z_t,x_t)\left(z\_{t}, x\_{t}\right). There are two types of input to neurons in the network: the first is the side information z_tz\_{t}, which can be thought of as the input features; the second is the input to the neuron, which will be the predictions output by the previous layer, or in the case of layer 0 , some (optionally) provided base predictions p_tp\_{t} that typically will be a function of z_t.z\_{t} . Each neuron will also take in a constant bias prediction, which helps empirically and is essential for universality guarantees.

Weights are learnt in a Gated Linear Network using Online Gradient Descent (OGD) locally at each neuron. They key observation is that as each neuron (i,k)(i, k) in layers i>0i>0 is itself a gated geometric mixture, all of these neurons can be thought of as individually predicting the target. Given side information zz , each neuron (i,k)(i, k) suffers a loss convex in its active weights u:=w_ikc_ik(z)u:=w\_{i k c\_{i k}(z)} of

_t(u):=log(GEO_u(xt;p_i1))\ell\_{t}(u):=-\log \left(\operatorname{GEO}\_{u}\left(x_{t} ; p\_{i-1}\right)\right)