Errgrams – A Way to Improving ASR for Highly Inflected Dravidian Languages

In this paper, we present results of our experiments with ASR for a highly inflected Dravidian language, Telugu. First, we propose a new metric for evaluating ASR performance for inflectional languages (Inflectional Word Error Rate IWER) which takes into account whether the incorrectly recognized word corresponds to the same lexicon lemma or not. We also present results achieved by applying a novel method – errgrams – to ASR lattice. With respect to confidence scores, the method tries to learn typical error patterns, which are then used for lattice correction, and applied just before standard lattice rescoring. Our confidence measures are based on word posteriors and were improved by applying antimodels trained on anti-examples generated by the standard N-gram language model. For Telugu language, we decreased the WER from 45.2% to 40.4% (by 4.8% absolute), and the IWER from 41.6% to 39.5% (2.1 % absolute), with respect to the baseline performance. All improvements are statistically significant using all three standard NIST significance tests for ASR.

[1]  Detlef Koll,et al.  Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[4]  William J. Byrne,et al.  On large vocabulary continuous speech recognition of highly inflectional language - czech , 2001, INTERSPEECH.

[5]  Frederick Jelinek,et al.  Structured Language Modeling for Speech Recognition , 2000, ArXiv.

[6]  Mirjam Sepesy Maucec,et al.  Large vocabulary continuous speech recognition of an inflected language using stems and endings , 2007, Speech Commun..

[7]  Hermann Ney,et al.  A comparison of word graph and n-best list based confidence measures , 1999, EUROSPEECH.

[8]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[9]  Chin-Hui Lee,et al.  String-based minimum verification error (SB-MVE) training for speech recognition , 1997, Comput. Speech Lang..

[10]  Mari Ostendorf,et al.  Lattice-based search strategies for large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Jáchym Kolár,et al.  Structural metadata annotation: moving beyond English , 2005, INTERSPEECH.

[12]  Thomas Schaaf,et al.  Confidence measures for spontaneous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.