An iterative multimodal framework for the transcription of handwritten historical documents

The transcription of historical documents is one of the most interesting tasks in which Handwritten Text Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi decoding) are employed, which allows a possible combination of the two modalities with little difficulties. In this work, we explore the possibility of using recognition results of one modality to restrict the decoding process of the other modality, and apply this process iteratively. Results of these multimodal iterative alternatives are significantly better than the baseline uni-modal systems and better than the non-iterative alternatives.

[1]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[2]  R. Ingold,et al.  Spoken handwriting for user authentication using joint modelling systems , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[3]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  Alicia Fornés,et al.  The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition , 2013, Pattern Recognit..

[6]  Harold Mouchère,et al.  Handwritten and Audio Information Fusion for Mathematical Symbol Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Frank K. Soong,et al.  Graph-Based Partial Hypothesis Fusion for Pen-Aided Speech Input , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription for Ancient Text Images , 2007, ICIAR.

[9]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[10]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part II , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Roberto Paredes,et al.  Bi-modal Handwritten Text Recognition (BiHTR) ICPR 2010 Contest Report , 2010, ICPR Contests.

[12]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[13]  Salvador España Boquera,et al.  Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[15]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Frank K. Soong,et al.  Word graph based speech rcognition error correction by handwriting input , 2006, ICMI '06.

[18]  Samy Bengio Multimodal speech processing using asynchronous Hidden Markov Models , 2004, Inf. Fusion.

[19]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Antonio L. Lagarda,et al.  A Multimodal Approach to Dictation of Handwritten Historical Documents , 2011, INTERSPEECH.

[23]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[24]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[25]  Fadoua Drira,et al.  Towards restoring historic documents degraded over time , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[26]  Haikal El Abed,et al.  ICDAR 2011 - French Handwriting Recognition Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[27]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[28]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[29]  William M. Campbell,et al.  Discriminative Keyword Selection Using Support Vector Machines , 2007, NIPS.

[30]  Francisco Casacuberta,et al.  Multimodal Interactive Pattern Recognition and Applications , 2011 .