Combining Handwriting and Speech Recognition for Transcribing Historical Handwritten Documents

In this a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.

[1]  Horst Bunke,et al.  Combination of Multiple Handwritten Text Line Recognition Systems with a Recursive Approach , 2006 .

[2]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[3]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hermann Ney,et al.  White-space models for offline Arabic handwriting recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[6]  Harish Kashyap Krishnamurthy,et al.  Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs , 2009 .

[7]  Mohamed Cheriet,et al.  Indexing On-line Handwritten Texts Using Word Confusion Networks , 2011, 2011 International Conference on Document Analysis and Recognition.

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Antonio L. Lagarda,et al.  A Multimodal Approach to Dictation of Handwritten Historical Documents , 2011, INTERSPEECH.

[10]  Steve Young,et al.  The HTK book , 1995 .

[11]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[12]  Alfons Juan-Císcar,et al.  The RODRIGO Database , 2010, LREC.

[13]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[14]  Stéphane Dupont,et al.  Bimodal combination of speech and handwriting for improved word recognition , 2005, 2005 13th European Signal Processing Conference.

[15]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.