Hand-Written and Machine-Printed Text Classification in Architecture, Engineering & Construction Documents

In AEC (Architecture, Engineering & Construction) industry, drawing documents are used as a blueprint to facilitate the construction process. It is also represented as a graphical language that communicates ideas and information from one mind to another. In AEC documents, text is present in Machine-printed and hand-written format. Since the algorithms for recognition of machine-printed and hand-written texts are different, it is important to distinguish between these two types of texts before sending the document to respective recognition system. In this paper we proposed a novel approach for the classification machine-printed and hand-written text from AEC Documents. Before Classification Hand-Written and Machine-Printed text from the documents our system used some preprocessing which includes binarization, text graphics separation and word segmentation. The Words are segmented based on certain structural properties of Isothetic Covers (IC) tightly enclosing the words in a document. The grid size properties of IC are selected by some statistical analysis of connected component of the document. Then Word level Gabor Filter based features are extracted with spooling information for classification. A standard classifier based on SVM is used to classify the text. This task is performed at word level of AEC documents and we achieved an overall accuracy of 98.45%.

[1]  David S. Doermann,et al.  Signature Detection and Matching for Document Image Retrieval , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Venu Govindaraju,et al.  Markov Random Field Based Text Identification from Annotated Machine Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[3]  Sargur N. Srihari,et al.  Segmentation and labeling of documents using conditional random fields , 2007, Electronic Imaging.

[4]  Bidyut Baran Chaudhuri,et al.  A System for Handwritten and Machine-Printed Text Separation in Bangla Document Images , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[5]  Partha Bhowmick,et al.  Construction of isothetic covers of a digital object: A combinatorial approach , 2010, J. Vis. Commun. Image Represent..

[6]  Ranjeet Srivastva,et al.  A Survey on Techniques of Separation of Machine Printed Text and Handwritten Text , 2013 .

[7]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[8]  Venu Govindaraju,et al.  Identifying Handwritten Text in Mixed Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Jinhong Katherine Guo,et al.  Separating handwritten material from machine printed text using hidden Markov models , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[11]  Zsolt Miklós Kovács-Vajna,et al.  A system for machine-written and hand-written character distinction , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[12]  Hong Ding,et al.  Handwritten and printed text distinction by using stroke thickness features , 2017, International Conference on Electronics and Information Engineering.

[13]  David S. Doermann,et al.  Multi-scale Structural Saliency for Signature Detection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Efstathios Stamatatos,et al.  Machine-printed from handwritten text discrimination , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[15]  Srinivasa Rao Chalamala,et al.  A System for Handwritten and Printed Text Classification , 2017, 2017 UKSim-AMSS 19th International Conference on Computer Modelling & Simulation (UKSim).

[16]  Flávio Bortolozzi,et al.  Characterizing and distinguishing text in bank cheque images , 2002, Proceedings. XV Brazilian Symposium on Computer Graphics and Image Processing.

[17]  Yun-Seok Nam,et al.  Classification of machine-printed and handwritten addresses on Korean mail piece images using geometric features , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[18]  Partha Bhowmick,et al.  Efficient Word Segmentation and Baseline Localization in Handwritten Documents Using Isothetic Covers , 2011, Int. J. Digit. Libr. Syst..

[19]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Umapada Pal,et al.  Structural handwritten and machine print classification for sparse content and arbitrary oriented document fragments , 2010, SAC '10.

[21]  Bidyut Baran Chaudhuri,et al.  Automatic separation of machine-printed and hand-written text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[22]  Umapada Pal,et al.  Signature Segmentation from Machine Printed Documents Using Conditional Random Field , 2011, 2011 International Conference on Document Analysis and Recognition.