Handwritten characters extraction from form based on line shape characteristics

Problem statement: Data entry form is a convenient and successful too l for information collection by filling in the sheets using pen and h andwriting. One of the most important fields in the se forms is the data filled boxes. Extracting the hand writing from the data entry forms is important for many purposes such as in documenting and archiving. The extraction process is also important in situations such as when it is necessary to the hand written recognition process. Approach: A simple and effective approach is presented to extract hand written characters, including digits and letters of any language from data filled boxes of data entry f orm and to deal with cases of overlaps between the handwritten characters and boxes' lines. The propos ed approach is based on line shape characteristic by detecting and removing the vertical and horizont al straight boxes' lines, while preserving the curved lines which represent the handwritten charac ters. The problem of the handwritten characters overlapping with the data filled boxes' line is sol ved using morphology dilation to reconstruct the broken characters after the removal of the boxes' l ines. Results: Experimental results have demonstrated that the proposed approach can extract handwriting from data filled boxes with overall 94.052% for data collection of 150 forms. Conclusion: The proposed algorithm has been successfully implemented and tested to achieve the objectives of handwritten extraction of any language from data filled boxes. However, this work could not deal wit h situations whereby the characters touch other immediate characters.

[1]  D. Guillevic,et al.  Cursive script recognition: A fast reader scheme , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[2]  David R. Ferguson,et al.  Intelligent Forms Processing , 1990, IBM Syst. J..

[3]  Amit Kumar Das,et al.  A hierarchical method for automated identification and segmentation of forms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  Flávio Bortolozzi,et al.  A New Table Extraction and Recovery Methodology with Little Use of Previous Knowledge , 2006 .

[5]  Mumtaj Begam,et al.  Enhancement of Bone Fracture Image Using Filtering Techniques , 2009 .

[6]  Rung Ching Chen,et al.  An Efficient Recognition and Data Extraction Method for Table-Form Documents , 1996, MVA.

[7]  Hsi-Jian Lee,et al.  Field data extraction for form document processing using a gravitation-based algorithm , 2001, Pattern Recognit..

[8]  Nikos Fakotakis,et al.  On the generalization of the form identification and skew detection problem , 2002, Pattern Recognit..

[9]  Salim Ouchtati,et al.  Segmentation and Recognition of Handwritten Numeric Chains , 2007 .

[10]  Amit Kumar Das,et al.  Fully Automated identification and Segmentation of Form Document Form Processing , 2004, ICCVG.

[11]  A. Pizano Extracting line features from images of business forms and tables , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[12]  Sargur N. Srihari,et al.  Analysis of Form Images , 1994, Int. J. Pattern Recognit. Artif. Intell..

[13]  Lin-Yu Tseng,et al.  An efficient knowledge-based stroke extraction method for multi-font chinese characters , 1992, Pattern Recognit..

[14]  Vincenzo Eramo,et al.  An interpretation system for land register maps , 1992, Computer.

[15]  Luca Boatto,et al.  Detection and separation of symbols connected to graphics in line drawings , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[16]  David R. Ferguson,et al.  Intelligent forms processing system , 2007, Machine Vision and Applications.

[17]  Hanan Aljuaid,et al.  A Tool to Develop Arabic Handwriting Recognition System Using Genetic Approach , 2010 .

[18]  Andy C. Downton,et al.  A comparison of binarization methods for historical archive documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).