AICurious Logo

What is: Random elastic image morphing?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

M. Bulacu, A. Brink, T. v. d. Zant and L. Schomaker, "Recognition of Handwritten Numerical Fields in a Large Single-Writer Historical Collection," 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 2009, pp. 808-812, doi: 10.1109/ICDAR.2009.8.

Code: https://github.com/GrHound/imagemorph.c

In contrast with the EM algorithm (Baum-Welch) for HMMs, training the basic character recognizer for a segmentation-based handwriting recognition system is a tricky issue without a standard solution. Our approach was to collect a labeled base set of digit images segmented by hand and then to augment this data by generating synthetic examples using random geometric distortions. We were incited by the record performance in digit recognition reported in Simard et al. (2003) but developed our own algorithm for this purpose. For every pixel (i,j) of the template image, a random displacement vector (Δx,Δy\Delta x,\Delta y) is generated. The displacement field of the complete image is smoothed using a Gaussian convolution kernel with standard deviation σ\sigma. The field is finally rescaled to an average amplitude A. The new morphed image (i',j') is generated using the displacement field and bilinear interpolation i'=i+Δ\Deltax,j'=j+Δ\Deltay. This morphing process is controlled by two parameters: the smoothing radius r and the average pixel displacement D. Both parameters are measured in units of pixels.

An intuitive interpretation is to imagine that the characters are written on a rubber sheet and we apply non-uniform random local distortions, contracting one part, while maybe expanding another part of the character (see Fig. 5). This random elastic morphing is more general than affine transforms, providing a rich ensemble of shape variations. We applied it to our base set of labeled digits (~130 samples per class) to obtain a much expanded training dataset (from 1 up to 80 times). The expansion factor f controls the amount of synthetic data: for every base example, f - 1 additional morphed patterns are generated and used in training.

This is a cheap method relying on random numbers and basic computer graphics. In this way, a virtually infinite volume of training samples can be fabricated. This stratagem is very successful and does not increase the load at recognition time for parametric classifiers. Essentially, we tum the tables around and, instead of trying to recognize a character garbled in an unpredictable way by the writer in the instantaneous act of handwriting, we generate the deformations ourselves, while training a neural network to become immune to such distortions.

The accompanying image, a crop of an RGB page scan, containing the cursive handwritten word 'Zwolle' was morphed a number of times, with parameters dist=1.5, radius=8.5 This distortion is sufficient to introduce a believable variance in the appearance.

imagemorph 1.5 8.5 < Zwolle.ppm > Zwolle-morphed.ppm

Netpbm image format is common in many CV tools. You can use ImageMagick's convert or other tools to convert to/fro .ppm

Also see: P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proc. of 7th ICDAR, pp 958-962, Edinburgh, Scotland, 2003.