What is: Factorized Dense Synthesized Attention?
Source | Synthesizer: Rethinking Self-Attention in Transformer Models |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Factorized Dense Synthesized Attention is a synthesized attention mechanism, similar to dense synthesized attention, but we factorize the outputs to reduce parameters and prevent overfitting. It was proposed as part of the Synthesizer architecture. The factorized variant of the dense synthesizer can be expressed as follows:
where projects input into dimensions, projects to dimensions, and . The output of the factorized module is now written as:
where , where , are tiling functions and . The tiling function simply duplicates the vector times, i.e., . In this case, is a projection of and is a projection of . To avoid having similar values within the same block, we compose the outputs of and .