**PowerSGD** is a distributed optimization technique that computes a low-rank approximation of the gradient using a generalized power iteration (known as subspace iteration). The approximation is computationally light-weight, avoiding any prohibitively expensive Singular Value Decomposition. To improve the quality of the efficient approximation, the authors warm-start the power iteration by reusing the approximation from the previous optimization step.

A **Gated Linear Unit**, or **GLU** computes:

$$ \text{GLU}\left(a, b\right) = a\otimes \sigma\left(b\right) $$

It is used in natural language processing architectures, for example the [Gated CNN](https://paperswithcode.com/method/gated-convolution-network), because here $b$ is the gate that control what information from $a$ is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.

Language Modeling with Gated Convolutional Networks

PowerSGD

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

**Average Pooling** is a pooling operation that calculates the average value for patches of a feature map, and uses it to create a downsampled (pooled) feature map. It is usually used after a convolutional layer. It adds a small amount of translation invariance - meaning translating the image by a small amount does not significantly affect the values of most pooled outputs. It extracts features more smoothly than [Max Pooling](https://paperswithcode.com/method/max-pooling), whereas max pooling extracts more pronounced features like edges.

Image Source: [here](https://www.researchgate.net/figure/Illustration-of-Max-Pooling-and-Average-Pooling-Figure-2-above-shows-an-example-of-max_fig2_333593451)

Source	PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: PowerSGD?

Viet-Anh on Software