Automatic separation of machine-printed and hand-written text lines

There are many types of documents where machine-printed and hand-written texts appear intermixed. Since the optical character recognition (OCR) methodologies for machine-printed and hand-written texts are different, it is necessary to separate these two types of text before feeding them to the respective OCR systems. In this paper, we present such a scheme for both Bangla and Devnagari characters. The scheme is based on the structural and statistical features of the machine-printed and hand-written text lines. The classification scheme has an accuracy of about 98.3%.

[1]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  S. Imade,et al.  Segmentation and classification for mixed text/image documents using neural network , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  S. Impedovo,et al.  Optical Character Recognition - a Survey , 1991, Int. J. Pattern Recognit. Artif. Intell..

[4]  G. Nagy,et al.  Chinese character recognition: a twenty-five-year retrospective , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[5]  Jürgen Franke,et al.  Writing style detection by statistical combination of classifiers in form reader applications , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[6]  Kuo-Chin Fan,et al.  Classification Of Machine-Printed And Handwritten Texts Using Character Block Layout Variance , 1998, Pattern Recognit..

[7]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[8]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[10]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[11]  Zsolt Miklós Kovács-Vajna,et al.  A system for machine-written and hand-written character distinction , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.