Semantic analysis based forms information retrieval and classification

AbstractData entry forms are employed in all types of enterprises to collect hundreds of customer’s information on daily basis. The information is filled manually by the customers. Hence, it is laborious and time consuming to use human operator to transfer these customers information into computers manually. Additionally, it is expensive and human errors might cause serious flaws. The automatic interpretation of scanned forms has facilitated many real applications from speed and accuracy point of view such as keywords spotting, sorting of postal addresses, script matching and writer identification. This research deals with different strategies to extract customer’s information from these scanned forms, interpretation and classification. Accordingly, extracted information is segmented into characters for their classification and finally stored in the forms of records in databases for their further processing. This paper presents a detailed discussion of these semantic based analysis strategies for forms processing. Finally, new directions are also recommended for future research.

[1]  Horst Bunke,et al.  Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition , 2004 .

[2]  Brijesh Verma,et al.  An investigation of the modified direction feature for cursive character recognition , 2007, Pattern Recognit..

[3]  Flávio Bortolozzi,et al.  A two-stage HMM-based system for recognizing handwritten numeral strings , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  José Ruiz-Pinales,et al.  Discriminative Capacity of Perceptual Features in Handwriting Recognition , 2005 .

[5]  Horst Bunke,et al.  Generation of synthetic training data for an HMM-based handwriting recognition system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Horst Bunke,et al.  Ensembles of classifiers for handwritten word recognition , 2003, Document Analysis and Recognition.

[7]  Horst Bunke,et al.  Off-line cursive handwriting recognition using multiple classifier systems—on the influence of vocabulary, ensemble, and training set size , 2005 .

[8]  Flávio Bortolozzi,et al.  A string length predictor to control the level building of HMMs for handwritten numeral recognition , 2002, Object recognition supported by user interaction for service robots.

[9]  Changsong Liu,et al.  Gabor filters-based feature extraction for character recognition , 2005, Pattern Recognit..

[10]  John D. Hey,et al.  AN EXPERIMENTAL ANALYSIS , 2004 .

[11]  Siti Mariyam Shamsuddin,et al.  Region-based touched character segmentation in handwritten words , 2011 .

[12]  Flávio Bortolozzi,et al.  Foreground and background information in an HMM-based method for recognition of isolated characters and numeral strings , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[13]  A.A. Aburas,et al.  New Promising Off Line Tool for Arabic Handwritten Character Recognition Based On JPEG2000 Image Compression , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[14]  Francesco Camastra,et al.  Combining neural gas and learning vector quantization for cursive character recognition , 2003, Neurocomputing.

[15]  Ranadhir Ghosh,et al.  A Fully Automated Offline Handwriting Recognition System Incorporating Rule Based Neural Network Validated Segmentation And Hybrid Neural Network Classifier , 2004, Int. J. Pattern Recognit. Artif. Intell..

[16]  Apostolos Antonacopoulos,et al.  Handwriting Segmentation Contest , 2007, ICDAR.

[17]  Amjad Rehman,et al.  Performance analysis of character segmentation approach for cursive script recognition on benchmark database , 2011, Digit. Signal Process..

[18]  Dzulkifli Mohamad,et al.  A simple segmentation approach for unconstrained cursive handwritten words in conjunction with neural network , 2008 .

[19]  Anil K. Jain,et al.  Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[20]  Tomoyuki Hamamura,et al.  An Analytic Word Recognition Algorithm Using a Posteriori Probability , 2007 .

[21]  Kenneth M. Sayre,et al.  Machine recognition of handwritten words: A project report , 1973, Pattern Recognit..

[22]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[23]  Amjad Rehman,et al.  Evaluation of Current Dental Radiographs Segmentation Approaches in Computer-aided Applications , 2013 .

[24]  Horst Bunke,et al.  Generation and Use of Synthetic Training Data in Cursive Handwriting Recognition , 2003, IbPRIA.

[25]  Ghazali Sulong,et al.  Dynamic Programming Based Hybrid Strategy for Offline Cursive Script Recognition , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[26]  Richard K. Moore,et al.  From theory to applications , 1986 .

[27]  Amjad Rehman,et al.  Virtual machine security challenges: case studies , 2014, Int. J. Mach. Learn. Cybern..

[28]  John Illingworth,et al.  The advantage of using an HMM-based approach for faxed word recognition , 1998, International Journal on Document Analysis and Recognition.

[29]  No Value,et al.  Proceedings of the International Conference on Document Analysis and Recognition , 2003 .

[30]  Cheng-Lin Liu,et al.  Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems , 2005 .

[31]  Amjad Rehman,et al.  An automatic approach for line detection and removal without smash-up characters , 2011 .

[32]  Gale Martin,et al.  Recognizing Overlapping Hand-Printed Characters by Centered-Object Integrated Segmentation and Recognition , 1991, NIPS.

[33]  Horst Bunke,et al.  Hidden Markov model length optimization for handwriting recognition systems , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[34]  Anthony J. Robinson,et al.  An Off-Line Cursive Handwriting Recognition System , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Minoru Mori,et al.  GENERATING NEW SAMPLES FROM HANDWRITTEN NUMERALS BASED ON POINT CORRESPONDENCE , 2004 .

[36]  John Illingworth,et al.  The recognition of handwritten digit strings of unknown length using hidden Markov models , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[37]  G. Kokkinakis,et al.  Handwritten character segmentation using transformation-based learning , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[38]  Cheri Smith,et al.  Using Photography in Counseling: Images of Healing , 2012 .

[39]  Marc-Peter Schambach Fast script word recognition with very large vocabulary , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[40]  Amjad Rehman,et al.  DOCUMENT SKEW ESTIMATION AND CORRECTION: ANALYSIS OF TECHNIQUES, COMMON PROBLEMS AND POSSIBLE SOLUTIONS , 2011, Appl. Artif. Intell..

[41]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[42]  Luiz Eduardo Soares de Oliveira,et al.  An implicit segmentation-based method for recognition of handwritten strings of characters , 2006, SAC.