What is: Spatial Transformer?

A Spatial Transformer is an image model block that explicitly allows the spatial manipulation of data within a convolutional neural network. It gives CNNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. Unlike pooling layers, where the receptive fields are fixed and local, the spatial transformer module is a dynamic mechanism that can actively spatially transform an image (or a feature map) by producing an appropriate transformation for each input sample. The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, as well as non-rigid deformations.

The architecture is shown in the Figure to the right. The input feature map $U$ is passed to a localisation network which regresses the transformation parameters $\theta$ . The regular spatial grid $G$ over $V$ is transformed to the sampling grid $T\_{\theta}\left(G\right)$ , which is applied to $U$ , producing the warped output feature map $V$ . The combination of the localisation network and sampling mechanism defines a spatial transformer.

Source	Spatial Transformer Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com