AICurious Logo

What is: Spatial Transformer?

SourceSpatial Transformer Networks
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Spatial Transformer is an image model block that explicitly allows the spatial manipulation of data within a convolutional neural network. It gives CNNs the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. Unlike pooling layers, where the receptive fields are fixed and local, the spatial transformer module is a dynamic mechanism that can actively spatially transform an image (or a feature map) by producing an appropriate transformation for each input sample. The transformation is then performed on the entire feature map (non-locally) and can include scaling, cropping, rotations, as well as non-rigid deformations.

The architecture is shown in the Figure to the right. The input feature map UU is passed to a localisation network which regresses the transformation parameters θ\theta. The regular spatial grid GG over VV is transformed to the sampling grid T_θ(G)T\_{\theta}\left(G\right), which is applied to UU, producing the warped output feature map VV. The combination of the localisation network and sampling mechanism defines a spatial transformer.