Consensus Network Based Hypotheses Combination for Arabic Offline Handwriting Recognition

Offline handwriting recognition (OHR) is an extremely challenging task because of many factors including variations in writing style, writing device and material, and noise in the scanning and collection process. Due to the diverse nature of the above challenges, it is highly unlikely that a single recognition technique can address all the characteristics of real-world handwritten documents. Therefore, one must consider designing different systems, each addressing specific challenges in the handwritten corpus, and then combining the hypotheses from these diverse systems. To that end, we present an innovative approach for combining hypotheses from multiple handwriting recognition systems. Our approach is based on generating a consensus network using hypotheses from a diverse set of handwriting recognition systems. Next, we decode the consensus network for producing the best possible hypothesis given an error criterion. Experimental results on an Arabic OHR task show that our combination algorithm outperforms the NIST ROVER technique and results in a 7% relative reduction in the word error rate over the single best OHR system.

[1]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[2]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[3]  Horst Bunke,et al.  Hidden Markov model-based ensemble methods for offline handwritten text line recognition , 2008, Pattern Recognit..

[4]  Rohit Prasad,et al.  Improvements in hidden Markov model based Arabic OCR , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Rohit Prasad,et al.  Unsupervised HMM Adaptation Using Page Style Clustering , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[9]  Rohit Prasad,et al.  Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach , 2006, SACH.

[10]  Rohit Prasad,et al.  Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Pradeep Natarajan,et al.  Baseline Dependent Percentile Features for Offline Arabic Handwriting Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.