What is: Spatial Feature Transform?

Spatial Feature Transform, or SFT, is a layer that generates affine transformation parameters for spatial-wise feature modulation, and was originally proposed within the context of image super-resolution. A Spatial Feature Transform (SFT) layer learns a mapping function $\mathcal{M}$ that outputs a modulation parameter pair $(\mathbf{\gamma}, \mathbf{\beta})$ based on some prior condition $\Psi$ . The learned parameter pair adaptively influences the outputs by applying an affine transformation spatially to each intermediate feature maps in an SR network. During testing, only a single forward pass is needed to generate the HR image given the LR input and segmentation probability maps.

More precisely, the prior $\Psi$ is modeled by a pair of affine transformation parameters $(\mathbf{\gamma}, \mathbf{\beta})$ through a mapping function $\mathcal{M}: \Psi \mapsto(\mathbf{\gamma}, \mathbf{\beta})$ . Consequently,

\hat{\mathbf{y}}=G_{\mathbf{\theta}}(\mathbf{x} \mid \mathbf{\gamma}, \mathbf{\beta}), \quad(\mathbf{\gamma}, \mathbf{\beta})=\mathcal{M}(\Psi)

After obtaining $(\mathbf{\gamma}, \mathbf{\beta})$ from conditions, the transformation is carried out by scaling and shifting feature maps of a specific layer:

\operatorname{SFT}(\mathbf{F} \mid \mathbf{\gamma}, \mathbf{\beta})=\mathbf{\gamma} \odot \mathbf{F}+\mathbf{\beta}

where $\mathbf{F}$ denotes the feature maps, whose dimension is the same as $\gamma$ and $\mathbf{\beta}$ , and $\odot$ is referred to element-wise multiplication, i.e., Hadamard product. Since the spatial dimensions are preserved, the SFT layer not only performs feature-wise manipulation but also spatial-wise transformation.

Source	Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com