AICurious Logo

What is: Conditional Batch Normalization?

SourceModulating early visual processing by language
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Conditional Batch Normalization (CBN) is a class-conditional variant of batch normalization. The key idea is to predict the γ\gamma and β\beta of the batch normalization from an embedding - e.g. a language embedding in VQA. CBN enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off. CBN has also been used in GANs to allow class information to affect the batch normalization parameters.

Consider a single convolutional layer with batch normalization module BN(F_i,c,h,wγ_c,β_c)\text{BN}\left(F\_{i,c,h,w}|\gamma\_{c}, \beta\_{c}\right) for which pretrained scalars γ_c\gamma\_{c} and β_c\beta\_{c} are available. We would like to directly predict these affine scaling parameters from, e.g., a language embedding e_q\mathbf{e\_{q}}. When starting the training procedure, these parameters must be close to the pretrained values to recover the original ResNet model as a poor initialization could significantly deteriorate performance. Unfortunately, it is difficult to initialize a network to output the pretrained γ\gamma and β\beta. For these reasons, the authors propose to predict a change δβ_c\delta\beta\_{c} and δγ_c\delta\gamma\_{c} on the frozen original scalars, for which it is straightforward to initialize a neural network to produce an output with zero-mean and small variance.

The authors use a one-hidden-layer MLP to predict these deltas from a question embedding e_q\mathbf{e\_{q}} for all feature maps within the layer:

Δβ=MLP(e_q)\Delta\beta = \text{MLP}\left(\mathbf{e\_{q}}\right)

Δγ=MLP(e_q)\Delta\gamma = \text{MLP}\left(\mathbf{e\_{q}}\right)

So, given a feature map with CC channels, these MLPs output a vector of size CC. We then add these predictions to the β\beta and γ\gamma parameters:

β^_c=β_c+Δβ_c\hat{\beta}\_{c} = \beta\_{c} + \Delta\beta\_{c}

γ^_c=γ_c+Δγ_c\hat{\gamma}\_{c} = \gamma\_{c} + \Delta\gamma\_{c}

Finally, these updated β^\hat{β} and γ^\hat{\gamma} are used as parameters for the batch normalization: BN(F_i,c,h,wγ_c^,β_c^)\text{BN}\left(F\_{i,c,h,w}|\hat{\gamma\_{c}}, \hat{\beta\_{c}}\right). The authors freeze all ResNet parameters, including γ\gamma and β\beta, during training. A ResNet consists of four stages of computation, each subdivided in several residual blocks. In each block, the authors apply CBN to the three convolutional layers.