Handwritten and machine printed text separation from Kannada document images

Handwritten and machine printed (H&P) text separation from document images is a precursor to advance the performance of the OCR system. This paper demonstrates the competence of frequency domain features for the classification of H&P text words. We propose wavelet-like discrete cosine transform (WDCT) based features. We conduct an experiment on a large dataset of 2000 text words of popular south Indian script Kannada, where k-NN classifier is employed. The efficacy of frequency domain features is experimentally validated with the classification accuracy of WDCT 99.50% using ten fold cross validation.

[1]  Mallikarjun Hangarge,et al.  Directional Discrete Cosine Transform for Handwritten Script Identification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[2]  Efstathios Stamatatos,et al.  Machine-printed from handwritten text discrimination , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[3]  William A. Barrett,et al.  Connected Component Level Discrimination of Handwritten and Machine-Printed Text Using Eigenfaces , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Zsolt Miklós Kovács-Vajna,et al.  A system for machine-written and hand-written character distinction , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[5]  Venu Govindaraju,et al.  Identifying Handwritten Text in Mixed Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Robert Smith,et al.  A computationally efficient technique for discriminating between hand-written and printed text , 1995 .

[7]  Bidyut Baran Chaudhuri,et al.  Machine-printed and hand-written text lines identification , 2001, Pattern Recognit. Lett..

[8]  Youcef Chibani,et al.  Machine printed handwritten text discrimination using Radon transform and SVM classifier , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[9]  Jürgen Franke,et al.  Writing style detection by statistical combination of classifiers in form reader applications , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[10]  S. Imade,et al.  Segmentation and classification for mixed text/image documents using neural network , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  Bidyut Baran Chaudhuri,et al.  Automatic Handwritten Indian Scripts Identification , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12]  Aura Conci,et al.  Automatic Discrimination between Printed and Handwritten Text in Documents , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[13]  Abdel Belaïd,et al.  Handwritten and Printed Text Separation in Real Document , 2013, MVA.

[14]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Upasana Patil,et al.  Word Level Handwritten and Printed Text Separation Based on Shape Features , 2012 .

[16]  Umapada Pal,et al.  Structural handwritten and machine print classification for sparse content and arbitrary oriented document fragments , 2010, SAC '10.

[17]  Jinhong Katherine Guo,et al.  Separating handwritten material from machine printed text using hidden Markov models , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[18]  N. Otsu A threshold selection method from gray level histograms , 1979 .