Parameter calibration for synthesizing realistic-looking variability in offline handwriting

Motivated by the widely accepted principle that the more training data, the better a recognition system performs, we conducted experiments asking human subjects to do evaluate a mixture of real English handwritten text lines and text lines altered from existing handwriting with various distortion degrees. The idea of generating synthetic handwriting is based on a perturbation method by T. Varga and H. Bunke that distorts an entire text line. There are two purposes of our experiments. First, we want to calibrate distortion parameter settings for Varga and Bunke's perturbation model. Second, we intend to compare the effects of parameter settings on different writing styles: block, cursive and mixed. From the preliminary experimental results, we determined appropriate ranges for parameter amplitude, and found that parameter settings should be altered for different handwriting styles. With the proper parameter settings, it should be possible to generate large amount of training and testing data for building better off-line handwriting recognition systems.

[1]  Masaki Nakagawa,et al.  The Impact of Large Training Sets on the Recognition Rate of Offline Japanese Kanji Character Classifiers , 2002, Document Analysis Systems.

[2]  David S. Doermann,et al.  Handwriting matching and its application to handwriting synthesis , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Jianchang Mao,et al.  Improving OCR performance using character degradation models and boosting algorithm , 1997, Pattern Recognit. Lett..

[4]  Minoru Mori,et al.  GENERATING NEW SAMPLES FROM HANDWRITTEN NUMERALS BASED ON POINT CORRESPONDENCE , 2004 .

[5]  Horst Bunke,et al.  Off-line handwritten textline recognition using a mixture of natural and synthetic training data , 2004, ICPR 2004.

[6]  Harris Drucker,et al.  Improving Performance in Neural Networks Using a Boosting Algorithm , 1992, NIPS.

[7]  Tamás VARGA,et al.  Effects of Training Set Expansion in Handwriting Recognition Using Synthetic Data , 2003 .

[8]  Jinhai Cai,et al.  Off-Line Unconstrained Handwritten Word Recognition , 2000, Int. J. Pattern Recognit. Artif. Intell..

[9]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[10]  Rafael Llobet,et al.  Training Set Expansion in Handwritten Character Recognition , 2002, SSPR/SPR.

[11]  Daniel P. Lopresti,et al.  Using perturbed handwriting to support writer identification in the presence of severe data constraints , 2011, Electronic Imaging.

[12]  Venu Govindaraju,et al.  Generating manifold samples from a handwritten word , 1994, Pattern Recognit. Lett..

[13]  Horst Bunke Template-based Synthetic Handwriting Generation for the Training of Recognition Systems , 2005 .

[14]  Horst Bunke,et al.  Comparing natural and synthetic training data for off-line cursive handwriting recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[15]  John Bennett,et al.  The effect of large training set sizes on online Japanese Kanji and English cursive recognizers , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[16]  Isabelle Guyon,et al.  Handwriting Synthesis From Handwritten Glyphs , 1996 .

[17]  Horst Bunke,et al.  Generation of synthetic training data for an HMM-based handwriting recognition system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[18]  Antoine Manzanera,et al.  Improved low complexity fully parallel thinning algorithm , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[19]  Venu Govindaraju,et al.  Pre-processing methods for handwritten Arabic documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).