A New Cost Function for Typewritten Digits Segmentation

This work presents a solution to the problem of the segmentation of digits in forms characterized by its low quality, as well as the existence of breaks and touching digits. We propose a new function of segmentation that adds to two traditional techniques (vertical projections and Tsujimoto metric) information of background of the digit. Unlike other techniques reported in the literature, ours obtains a near-optimum number of break points in fields containing broken, blurred and touching characters, leading to high accuracy in the global OCR system. The accuracy obtained in the segmentation of the forms fields is of 99,74% on a sample of 11,283 fields of 144 forms of low quality, which provides a final accuracy to the automatic recognition process of 99,42% of digits correctly classified.

[1]  Javier Muguerza Rivero Una solución al reconocimiento automático de dígitos imprecisos en formularios , 1996 .

[2]  Haruo Asada,et al.  Resolving Ambiguity in Segmenting Touching Characters , 1992 .

[3]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Javier Muguerza,et al.  A two-stage classifier for broken and blurred digits in forms , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[5]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[6]  Andy C. Downton,et al.  Recognition and Verification of Hardwritten and Hand-Printer British Postal Addresses , 1991, Int. J. Pattern Recognit. Artif. Intell..