AICurious Logo

What is: Switch FFN?

SourceSwitch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Switch FFN is a sparse layer that operates independently on tokens within an input sequence. It is shown in the blue block in the figure. We diagram two tokens (x_1x\_{1} = “More” and x_2x\_{2} = “Parameters” below) being routed (solid lines) across four FFN experts, where the router independently routes each token. The switch FFN layer returns the output of the selected FFN multiplied by the router gate value (dotted-line).