Label transition and selection pruning and automatic decoding parameter optimization for time-synchronous Viterbi decoding

Hidden Markov Model (HMM)-based classifiers have been successfully used for sequential labeling problems such as speech recognition and optical character recognition for decades. They have been especially successful in the domains where the segmentation is not known or difficult to obtain, since, in principle, all possible segmentation points can be taken into account. However, the benefit comes with a non-negligible computational cost. In this paper, we propose simple yet effective new pruning algorithms to speed up decoding with HMM-based classifiers of up to 95% relative over a baseline. As the number of tunable decoding parameters increases, it becomes more difficult to optimize the parameters for each configuration. We also propose a novel technique to estimate the parameters based on a loss value without relying on a grid search.

[1]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[2]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[3]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[5]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[6]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Dmitriy Genzel,et al.  HMM-based script identification for OCR , 2013, MOCR '13.

[8]  Raymond W. Smith Hybrid Page Layout Analysis via Tab-Stop Detection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  Owen Taylor Pango, an open-source Unicode text layout engine , 2004 .

[10]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[11]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[12]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).