AICurious Logo

What is: Spatial Feature Transform?

SourceRecovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Spatial Feature Transform, or SFT, is a layer that generates affine transformation parameters for spatial-wise feature modulation, and was originally proposed within the context of image super-resolution. A Spatial Feature Transform (SFT) layer learns a mapping function M\mathcal{M} that outputs a modulation parameter pair (γ,β)(\mathbf{\gamma}, \mathbf{\beta}) based on some prior condition Ψ\Psi. The learned parameter pair adaptively influences the outputs by applying an affine transformation spatially to each intermediate feature maps in an SR network. During testing, only a single forward pass is needed to generate the HR image given the LR input and segmentation probability maps.

More precisely, the prior Ψ\Psi is modeled by a pair of affine transformation parameters (γ,β)(\mathbf{\gamma}, \mathbf{\beta}) through a mapping function M:Ψ(γ,β)\mathcal{M}: \Psi \mapsto(\mathbf{\gamma}, \mathbf{\beta}). Consequently,

y^=Gθ(xγ,β),(γ,β)=M(Ψ)\hat{\mathbf{y}}=G_{\mathbf{\theta}}(\mathbf{x} \mid \mathbf{\gamma}, \mathbf{\beta}), \quad(\mathbf{\gamma}, \mathbf{\beta})=\mathcal{M}(\Psi)

After obtaining (γ,β)(\mathbf{\gamma}, \mathbf{\beta}) from conditions, the transformation is carried out by scaling and shifting feature maps of a specific layer:

SFT(Fγ,β)=γF+β\operatorname{SFT}(\mathbf{F} \mid \mathbf{\gamma}, \mathbf{\beta})=\mathbf{\gamma} \odot \mathbf{F}+\mathbf{\beta}

where F\mathbf{F} denotes the feature maps, whose dimension is the same as γ\gamma and β\mathbf{\beta}, and \odot is referred to element-wise multiplication, i.e., Hadamard product. Since the spatial dimensions are preserved, the SFT layer not only performs feature-wise manipulation but also spatial-wise transformation.