Word graphs size impact on the performance of handwriting document applications

Abstract Two document processing applications are considered: computer-assisted transcription of text images (CATTI) and Keyword Spotting (KWS), for transcribing and indexing handwritten documents, respectively. Instead of working directly on the handwriting images, both of them employ meta-data structures called word graphs (WG), which are obtained using segmentation-free handwritten text recognition technology based on N-gram language models and hidden Markov models. A WG contains most of the relevant information of the original text (line) image required by CATTI and KWS but, if it is too large, the computational cost of generating and using it can become unafordable. Conversely, if it is too small, relevant information may be lost, leading to a reduction of CATTI or KWS performance. We study the trade-off between WG size and performance in terms of effectiveness and efficiency of CATTI and KWS. Results show that small, computationally cheap WGs can be used without loosing the excellent CATTI and KWS performance achieved with huge WGs.

[1]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[2]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[3]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription for Ancient Text Images , 2007, ICIAR.

[4]  Horst Bunke,et al.  Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition , 2004, ICPR 2004.

[5]  Geoffrey Zweig,et al.  LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .

[6]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[7]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Gunnar Evermann,et al.  Minimum Word Error Rate Decoding , 1999 .

[10]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[11]  Moisés Pastor,et al.  iATROS: A SPEECH AND HANDWRITING RECOGNITION SYSTEM , 2008 .

[12]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[13]  Georg Heigold,et al.  The RWTH aachen university open source speech recognition system , 2009, INTERSPEECH.

[14]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Hermann Ney,et al.  Analysis of Preprocessing Techniques for Latin Handwriting Recognition , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Enrique Vidal,et al.  Efficient Error-Correcting Viterbi Parsing , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Ngoc Thang Vu,et al.  Generating exact lattices in the WFST framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Alejandro Héctor Toselli,et al.  Word-Graph Based Applications for Handwriting Documents: Impact of Word-Graph Size on Their Performances , 2015, IbPRIA.

[22]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[23]  Volkmar Frinken,et al.  HMM word graph based keyword spotting in handwritten document images , 2016, Inf. Sci..

[24]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[25]  David Furcy,et al.  Limited Discrepancy Beam Search , 2005, IJCAI.

[26]  Richard M. Davis,et al.  tranScriptorium: a european project on handwritten text recognition , 2013, ACM Symposium on Document Engineering.

[27]  Samy Bengio,et al.  Offline recognition of unconstrained handwritten texts using HMMs and statistical language models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Frank K. Soong,et al.  Word graph based speech rcognition error correction by handwriting input , 2006, ICMI '06.

[29]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[30]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Jafar Tanha,et al.  Combining higher-order N-grams and intelligent sample selection to improve language modeling for Handwritten Text Recognition , 2015, ESANN.

[32]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Joseph Olive,et al.  Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[34]  Stephen E. Robertson,et al.  A new interpretation of average precision , 2008, SIGIR '08.

[35]  Volkmar Frinken,et al.  Automatic Transcription of Handwritten Medieval Documents , 2009, 2009 15th International Conference on Virtual Systems and Multimedia.

[36]  N. Strom Generation and Minimization of Word Graphs in Continuous Speech Recognition , 2007 .

[37]  Carlos D. Martínez-Hinarejos,et al.  Multimodal Output Combination for Transcribing Historical Handwritten Documents , 2015, CAIP.

[38]  Alejandro Héctor Toselli Rossi,et al.  Multimodal Interactive Handwritten Text Transcription , 2012, Series in Machine Perception and Artificial Intelligence.

[39]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[40]  J. Wolf,et al.  The HWIM speech understanding system , 1977 .

[41]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[42]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Volkmar Frinken,et al.  A Novel Word Spotting Method Based on Recurrent Neural Networks , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[45]  Alejandro Héctor Toselli,et al.  Multimodal interactive transcription of text images , 2010, Pattern Recognit..

[46]  Alejandro Héctor Toselli Rossi,et al.  Fast HMM-Filler Approach for Key Word Spotting in Handwritten Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[47]  Andreas Stolcke,et al.  Efficient lattice representation and generation , 1998, ICSLP.

[48]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[49]  Victor Lesser,et al.  The hearsay-II speech understanding system: a tutorial , 1990 .