AICurious Logo

What is: Affine Operator?

SourceResMLP: Feedforward networks for image classification with data-efficient training
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Affine Operator is an affine transformation layer introduced in the ResMLP architecture. This replaces layer normalization, as in Transformer based networks, which is possible since in the ResMLP, there are no self-attention layers which makes training more stable - hence allowing a more simple affine transformation.

The affine operator is defined as:

Affα,β(x)=Diag(α)x+β\operatorname{Aff}_{\mathbf{\alpha}, \mathbf{\beta}}(\mathbf{x})=\operatorname{Diag}(\mathbf{\alpha}) \mathbf{x}+\mathbf{\beta}

where α\alpha and β\beta are learnable weight vectors. This operation only rescales and shifts the input element-wise. This operation has several advantages over other normalization operations: first, as opposed to Layer Normalization, it has no cost at inference time, since it can absorbed in the adjacent linear layer. Second, as opposed to BatchNorm and Layer Normalization, the Aff operator does not depend on batch statistics.