What is: Non-Local Operation?
Source | Non-local Neural Networks |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
A Non-Local Operation is a component for capturing long-range dependencies with deep neural networks. It is a generalization of the classical non-local mean operation in computer vision. Intuitively a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps. The set of positions can be in space, time, or spacetime, implying that these operations are applicable for image, sequence, and video problems.
Following the non-local mean operation, a generic non-local operation for deep neural networks is defined as:
Here is the index of an output position (in space, time, or spacetime) whose response is to be computed and is the index that enumerates all possible positions. x is the input signal (image, sequence, video; often their features) and is the output signal of the same size as . A pairwise function computes a scalar (representing relationship such as affinity) between and all . The unary function computes a representation of the input signal at the position . The response is normalized by a factor .
The non-local behavior is due to the fact that all positions () are considered in the operation. As a comparison, a convolutional operation sums up the weighted input in a local neighborhood (e.g., in a 1D case with kernel size 3), and a recurrent operation at time is often based only on the current and the latest time steps (e.g., or ).
The non-local operation is also different from a fully-connected (fc) layer. The equation above computes responses based on relationships between different locations, whereas fc uses learned weights. In other words, the relationship between and is not a function of the input data in fc, unlike in nonlocal layers. Furthermore, the formulation in the equation above supports inputs of variable sizes, and maintains the corresponding size in the output. On the contrary, an fc layer requires a fixed-size input/output and loses positional correspondence (e.g., that from to at the position ).
A non-local operation is a flexible building block and can be easily used together with convolutional/recurrent layers. It can be added into the earlier part of deep neural networks, unlike fc layers that are often used in the end. This allows us to build a richer hierarchy that combines both non-local and local information.
In terms of parameterisation, we usually parameterise as a linear embedding of the form , where is a weight matrix to be learned. This is implemented as, e.g., 1×1 convolution in space or 1×1×1 convolution in spacetime. For we use an affinity function, a list of which can be found here.