A Robust Two Level Classification Algorithm for Text Localization in Documents

This paper describes a two level classification algorithm to discriminate the handwritten elements from the printed text in a printed document. The proposed technique is independent of size, slant, orientation, translation and other variations in handwritten text. At the first level of classification, we use two classifiers and present a comparison between the nearest neighbour classifier and Support Vector Machines(SVM) classifier to localize the handwritten text. The features that are extracted from the document are seven invariant central moments and based on these features, we classify the text as hand-written. At the second level, we use Delaunay triangulation to reclassify the misclassified elements. When Delaunay triangulation is imposed on the centroid points of the connected components, we extract features based on the triangles and reclassify the misclassified elements. We remove the noise components in the document as part of the pre-processing step.

[1]  Kuo-Chin Fan,et al.  Classification Of Machine-Printed And Handwritten Texts Using Character Block Layout Variance , 1998, Pattern Recognit..

[2]  Jinhong Katherine Guo,et al.  Separating handwritten material from machine printed text using hidden Markov models , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[3]  S. Mehrotra,et al.  Feature Extraction Based on Moment Invariants for Handwriting Recognition , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[4]  Michel Barlaud,et al.  Fractal image compression based on Delaunay triangulation and vector quantization , 1996, IEEE Trans. Image Process..

[5]  Zsolt Miklós Kovács-Vajna,et al.  A system for machine-written and hand-written character distinction , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[6]  Bidyut Baran Chaudhuri,et al.  Automatic separation of machine-printed and hand-written text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[7]  S. Imade,et al.  Segmentation and classification for mixed text/image documents using neural network , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).