AICurious Logo

What is: Ghost Module?

SourceGhostNet: More Features from Cheap Operations
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

A Ghost Module is an image block for convolutional neural network that aims to generate more features by using fewer parameters. Specifically, an ordinary convolutional layer in deep neural networks is split into two parts. The first part involves ordinary convolutions but their total number is controlled. Given the intrinsic feature maps from the first part, a series of simple linear operations are applied for generating more feature maps.

Given the widely existing redundancy in intermediate feature maps calculated by mainstream CNNs, ghost modules aim to reduce them. In practice, given the input data XRc×h×wX\in\mathbb{R}^{c\times h\times w}, where cc is the number of input channels and hh and ww are the height and width of the input data, respectively, the operation of an arbitrary convolutional layer for producing nn feature maps can be formulated as

Y=Xf+b,Y = X*f+b,

where * is the convolution operation, bb is the bias term, YRh×w×nY\in\mathbb{R}^{h'\times w'\times n} is the output feature map with nn channels, and fRc×k×k×nf\in\mathbb{R}^{c\times k\times k \times n} is the convolution filters in this layer. In addition, hh' and ww' are the height and width of the output data, and k×kk\times k is the kernel size of convolution filters ff, respectively. During this convolution procedure, the required number of FLOPs can be calculated as nhwckkn\cdot h'\cdot w'\cdot c\cdot k\cdot k, which is often as large as hundreds of thousands since the number of filters nn and the channel number cc are generally very large (e.g. 256 or 512).

Here, the number of parameters (in ff and bb) to be optimized is explicitly determined by the dimensions of input and output feature maps. The output feature maps of convolutional layers often contain much redundancy, and some of them could be similar with each other. We point out that it is unnecessary to generate these redundant feature maps one by one with large number of FLOPs and parameters. Suppose that the output feature maps are ghosts of a handful of intrinsic feature maps with some cheap transformations. These intrinsic feature maps are often of smaller size and produced by ordinary convolution filters. Specifically, mm intrinsic feature maps YRh×w×mY'\in\mathbb{R}^{h'\times w'\times m} are generated using a primary convolution:

Y=Xf,Y' = X*f',

where fRc×k×k×mf'\in\mathbb{R}^{c\times k\times k \times m} is the utilized filters, mnm\leq n and the bias term is omitted for simplicity. The hyper-parameters such as filter size, stride, padding, are the same as those in the ordinary convolution to keep the spatial size (ie hh' and ww') of the output feature maps consistent. To further obtain the desired nn feature maps, we apply a series of cheap linear operations on each intrinsic feature in YY' to generate ss ghost features according to the following function:

yij=Φi,j(yi),  i=1,...,m,    j=1,...,s,y_{ij} = \Phi_{i,j}(y'_i),\quad \forall\; i = 1,...,m,\;\; j = 1,...,s,

where y_iy'\_i is the ii-th intrinsic feature map in YY', Φ_i,j\Phi\_{i,j} in the above function is the jj-th (except the last one) linear operation for generating the jj-th ghost feature map yijy_{ij}, that is to say, y_iy'\_i can have one or more ghost feature maps {y_ij}_j=1s\{y\_{ij}\}\_{j=1}^{s}. The last Φ_i,s\Phi\_{i,s} is the identity mapping for preserving the intrinsic feature maps. we can obtain n=msn=m\cdot s feature maps Y=[y_11,y_12,,y_ms]Y=[y\_{11},y\_{12},\cdots,y\_{ms}] as the output data of a Ghost module. Note that the linear operations Φ\Phi operate on each channel whose computational cost is much less than the ordinary convolution. In practice, there could be several different linear operations in a Ghost module, eg 3×33\times 3 and 5×55\times5 linear kernels, which will be analyzed in the experiment part.