The **Affine Operator** is an affine transformation layer introduced in the [ResMLP](https://paperswithcode.com/method/resmlp) architecture. This replaces [layer normalization](https://paperswithcode.com/method/layer-normalization), as in [Transformer based networks](https://paperswithcode.com/methods/category/transformers), which is possible since in the ResMLP, there are no [self-attention layers](https://paperswithcode.com/method/scaled) which makes training more stable - hence allowing a more simple affine transformation.

The affine operator is defined as:

$$ \operatorname{Aff}_{\mathbf{\alpha}, \mathbf{\beta}}(\mathbf{x})=\operatorname{Diag}(\mathbf{\alpha}) \mathbf{x}+\mathbf{\beta} $$

where $\alpha$ and $\beta$ are learnable weight vectors. This operation only rescales and shifts the input element-wise. This operation has several advantages over other normalization operations: first, as opposed to Layer Normalization, it has no cost at inference time, since it can absorbed in the adjacent linear layer. Second, as opposed to [BatchNorm](https://paperswithcode.com/method/batch-normalization) and Layer Normalization, the Aff operator does not depend on batch statistics.

A **Res2Net Block** is an image model block that constructs hierarchical residual-like connections
within one single [residual block](https://paperswithcode.com/method/residual-block). It was proposed as part of the [Res2Net](https://paperswithcode.com/method/res2net) CNN architecture.

The block represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The $3 \times 3$ filters of $n$ channels is replaced with a set of smaller filter groups, each with $w$ channels. These smaller filter groups are connected in a hierarchical residual-like style to increase the number of scales that the output features can represent. Specifically, we divide input feature maps into several groups. A group of filters first extracts features from a group of input feature maps. Output features of the previous group are then sent to the next group of filters along with another group of input feature maps. 

This process repeats several times until all input feature maps are processed. Finally, feature maps from all groups are concatenated and sent to another group of $1 \times 1$ filters to fuse information altogether. Along with any possible path in which input features are transformed to output features, the equivalent receptive field increases whenever it passes a $3 \times 3$ filter, resulting in many equivalent feature scales due to combination effects.

One way of thinking of these blocks is that they expose a new dimension, **scale**,  alongside the existing dimensions of depth, width, and cardinality.

Res2Net Block

Res2Net: A New Multi-scale Backbone Architecture

Affine Operator

ResMLP: Feedforward networks for image classification with data-efficient training

**Shape Adaptor** is a novel resizing module for neural networks. It is a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided [convolution](https://paperswithcode.com/method/convolution). This module allows for a learnable shaping factor which differs from the traditional resizing layers that are fixed and deterministic.

Image Source: [Liu et al.](https://arxiv.org/pdf/2008.00892v2.pdf)

Source	ResMLP: Feedforward networks for image classification with data-efficient training
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com