AICurious Logo

What is: Residual Multi-Layer Perceptrons?

SourceResMLP: Feedforward networks for image classification with data-efficient training
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Residual Multi-Layer Perceptrons, or ResMLP, is an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. At the end of the network, the patch representations are average pooled, and fed to a linear classifier.

Layer normalization is replaced with a simpler affine transformation, thanks to the absence of self-attention layers which makes training more stable. The affine operator is applied at the beginning ("pre-normalization") and end ("post-normalization") of each residual block. As a pre-normalization, Aff replaces LayerNorm without using channel-wise statistics. Initialization is achieved as α=1\mathbf{\alpha}=\mathbf{1}, and β=0\mathbf{\beta}=\mathbf{0}. As a post-normalization, Aff is similar to LayerScale and α\mathbf{\alpha} is initialized with the same small value.