What is: DExTra?
Source | DeLighT: Deep and Light-weight Transformer |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
DExTra, or Deep and Light-weight Expand-reduce Transformation, is a light-weight expand-reduce transformation that enables learning wider representations efficiently.
DExTra maps a dimensional input vector into a high dimensional space (expansion) and then reduces it down to a dimensional output vector (reduction) using layers of group transformations. During these expansion and reduction phases, DExTra uses group linear transformations because they learn local representations by deriving the output from a specific part of the input and are more efficient than linear transformations. To learn global representations, DExTra shares information between different groups in the group linear transformation using feature shuffling
Formally, the DExTra transformation is controlled by five configuration parameters: (1) depth , (2) width multiplier , (3) input dimension , (4) output dimension , and (5) maximum groups in a group linear transformation. In the expansion phase, DExTra projects the -dimensional input to a high-dimensional space, , linearly using layers. In the reduction phase, DExTra projects the -dimensional vector to a -dimensional space using the remaining layers. Mathematically, we define the output at each layer as:
where the number of groups at each layer are computed as:
In the above equations, is a group linear transformation function. The function takes the input , splits it into groups, and then applies a linear transformation with learnable parameters and bias to each group independently. The outputs of each group are then concatenated to produce the final output . The function first shuffles the output of each group in and then combines it with the input using an input mixer connection.
In the authors' experiments, they use so that each group has at least 32 input elements. Note that (i) group linear transformations reduce to linear transformations when , and (ii) DExTra is equivalent to a multi-layer perceptron when .