What is: Filter Response Normalization?

Filter Response Normalization (FRN) is a type of normalization that combines normalization and an activation function, which can be used as a replacement for other normalizations and activations. It operates on each activation channel of each batch element independently, eliminating the dependency on other batch elements.

To demonstrate, assume we are dealing with the feed-forward convolutional neural network. We follow the usual convention that the filter responses (activation maps) produced after a convolution operation are a 4D tensor $X$ with shape $[B, W, H, C]$ , where $B$ is the mini-batch size, $W, H$ are the spatial extents of the map, and $C$ is the number of filters used in convolution. $C$ is also referred to as output channels. Let $x = X_{b,:,:,c} \in \mathcal{R}^{N}$ , where $N = W \times H$ , be the vector of filter responses for the $c^{th}$ filter for the $b^{th}$ batch point. Let $\nu^2 = \sum\_i x_i^2/N$ , be the mean squared norm of $x$ .

Then Filter Response Normalization is defined as the following:

\hat{x} = \frac{x}{\sqrt{\nu^2 + \epsilon}},

where $\epsilon$ is a small positive constant to prevent division by zero.

A lack of mean centering in FRN can lead to activations having an arbitrary bias away from zero. Such a bias in conjunction with ReLU can have a detrimental effect on learning and lead to poor performance and dead units. To address this the authors augment ReLU with a learned threshold $\tau$ to yield:

z = \max(y, \tau)

Since $\max(y, \tau){=}\max(y-\tau,0){+}\tau{=}\text{ReLU}{(y{-}\tau)}{+}\tau$ , the effect of this activation is the same as having a shared bias before and after ReLU.

Source	Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com