What is: Switch FFN?
Source | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
A Switch FFN is a sparse layer that operates independently on tokens within an input sequence. It is shown in the blue block in the figure. We diagram two tokens ( = “More” and = “Parameters” below) being routed (solid lines) across four FFN experts, where the router independently routes each token. The switch FFN layer returns the output of the selected FFN multiplied by the router gate value (dotted-line).