What is: DeLighT Block?
Source | DeLighT: Deep and Light-weight Transformer |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
A DeLighT Block is a block used in the DeLighT transformer architecture. It uses a DExTra transformation to reduce the dimensionality of the vectors entered into the attention layer, where a single-headed attention module is used. Since the DeLighT block learns wider representations of the input across different layers using DExTra, it enables the authors to replace multi-head attention with single-head attention. This is then followed by a light-weight FFN which, rather than expanding the dimension (as in normal Transformers which widen to a dimension 4x the size), imposes a bottleneck and squeezes the dimensions. Again, the reason for this is that the DExTra transformation has already incorporated wider representations so we can squeeze instead at this layer.