RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.

[1]  Laurence Likforman-Sulem,et al.  Document Recognition and Retrieval XVII , 2007 .

[2]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Hermann Ney,et al.  Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Hermann Ney,et al.  White-space models for offline Arabic handwriting recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 2000, Comput. Speech Lang..

[6]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[7]  Hiroshi Murase,et al.  On-line handwriting recognition , 1992, Proc. IEEE.

[8]  Hermann Ney,et al.  Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition , 2011, 2011 18th IEEE International Conference on Image Processing.

[9]  Georg Heigold,et al.  Margin-Based Discriminative Training for String Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[10]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[11]  Junichi Kanai,et al.  Character recognition , 1997 .

[12]  Prem Natarajan,et al.  Portable Language-Independent Adaptive Translation from OCR , 2008 .

[13]  Tony P. Pridmore,et al.  Building a multi-modal Arabic corpus (MMAC) , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[14]  Georg Heigold,et al.  Modified MPE/MMI in a transducer-based framework , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[16]  Lambert Schomaker,et al.  A Comparison of Feature and Pixel-Based Methods for Recognizing Handwritten Bangla Digits , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[17]  Georg Heigold,et al.  Recent improvements of the RWTH GALE Mandarin LVCSR system , 2008, INTERSPEECH.

[18]  Gernot A. Fink,et al.  Unsupervised Estimation of Writing Style Models for Improved Unconstrained Off-line Handwriting Recognition , 2006 .

[19]  Hermann Ney,et al.  Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system , 2008, INTERSPEECH.

[20]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[21]  Georg Heigold,et al.  Discriminative HMMS, log-linear models, and CRFS: What is the difference? , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Gerhard Rigoll,et al.  Novel Hybrid NN/HMM Modelling Techniques for On-line Handwriting Recognition , 2006 .

[23]  David R. Bull,et al.  Projective image restoration using sparsity regularization , 2013, 2013 IEEE International Conference on Image Processing.

[24]  Hermann Ney,et al.  Efficient estimation of speaker-specific projecting feature transforms , 2007, INTERSPEECH.

[25]  Stefan Jäger,et al.  Arabic and Chinese Handwriting Recognition - SACH 2006 Summit College Park, MD, USA, September 27-28, 2006 Selected Papers , 2008, SACH.

[26]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[27]  Georg Heigold,et al.  Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[28]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[29]  Georg Heigold,et al.  Confidence-Based Discriminative Training for Model Adaptation in Offline Arabic Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[30]  Michiel Bacchiani,et al.  Confidence scores for acoustic model adaptation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Daniel Povey,et al.  Discriminative training for HMM-based offline handwritten character recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[32]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[33]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Volker Märgner,et al.  ICFHR 2010 - Arabic Handwriting Recognition Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[35]  Paul A. Viola,et al.  Text recognition of low-resolution document images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[36]  Alain Biem,et al.  Maximization of mutual information for offline Thai handwriting recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[38]  Venu Govindaraju,et al.  Offline Arabic handwriting recognition: a survey , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[40]  Hermann Ney,et al.  Improved MLLR speaker adaptation using confidence measures for conversational speech recognition , 2000, INTERSPEECH.

[41]  Horst Bunke,et al.  Hidden Markov model-based ensemble methods for offline handwritten text line recognition , 2008, Pattern Recognit..

[42]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[43]  Hermann Ney,et al.  Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[44]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[45]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[46]  Georg Heigold,et al.  A log-linear discriminative modeling framework for speech recognition , 2010 .

[47]  Georg Heigold,et al.  The RWTH aachen university open source speech recognition system , 2009, INTERSPEECH.

[48]  Volker Märgner,et al.  ICDAR 2009-Arabic handwriting recognition competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[49]  Fabio Valente,et al.  Hierarchical neural networks feature extraction for LVCSR system , 2007, INTERSPEECH.

[50]  Sabri A. Mahmoud,et al.  Printed Arabic text database (PATDB) for research and benchmarking , 2010 .

[51]  Georg Heigold,et al.  Confidence- and margin-based MMI/MPE discriminative training for off-line handwriting recognition , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[52]  Jinhai Cai,et al.  Handwriting Recognition , 2003 .

[53]  Wolfgang E. Nagel,et al.  Engineering and Computer-Science , 1999 .

[54]  Hermann Ney,et al.  The RWTH 2009 quaero ASR evaluation system for English and German , 2010, INTERSPEECH.

[55]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[56]  Alfons Juan-Císcar,et al.  Windowed Bernoulli Mixture HMMs for Arabic Handwritten Word Recognition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[57]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[58]  Volker Märgner,et al.  Improvement of Arabic handwriting recognition systems; combination and/or reject? , 2009, Electronic Imaging.

[59]  Alain Biem,et al.  Minimum classification error training for online handwriting recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Richard M. Schwartz,et al.  Robust language-independent OCR system , 1999, Other Conferences.

[61]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[62]  Rohit Prasad,et al.  Improvements in hidden Markov model based Arabic OCR , 2008, 2008 19th International Conference on Pattern Recognition.

[63]  Rohit Prasad,et al.  Multi-lingual Offline Handwriting Recognition Using Hidden Markov Models: A Script-Independent Approach , 2006, SACH.

[64]  Erik G. Learned-Miller,et al.  Learning on the Fly: Font-Free Approaches to Difficult OCR Problems , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[65]  H. Ney,et al.  INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING , 2007 .

[66]  Georg Heigold,et al.  The RWTH 2007 TC-STAR evaluation system for european English and Spanish , 2007, INTERSPEECH.

[67]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[68]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Geoffrey Zweig,et al.  LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .

[70]  Ibon Saratxaga,et al.  Detection of synthetic speech for the problem of imposture , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[71]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[72]  Thierry Artières,et al.  2009 10th International Conference on Document Analysis and Recognition MAXIMUM MARGIN TRAINING OF GAUSSIAN HMMS FOR HANDWRITRING RECOGNITION , 2022 .

[73]  Tasos Anastasakos,et al.  The use of confidence measures in unsupervised adaptation of speech recognizers , 1998, ICSLP.

[74]  Hermann Ney,et al.  Advances in Arabic broadcast news transcription at RWTH , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[75]  M Volker,et al.  ICDAR 2007 - Arabic Handwriting Recognition Competition , 2007 .

[76]  Philippe Dreuw,et al.  Probabilistic sequence models for image sequence processing and recognition , 2012 .

[77]  Adel M. Alimi,et al.  A New Arabic Printed Text Image Database and Evaluation Protocols , 2009, 2009 10th International Conference on Document Analysis and Recognition.