Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition

We describe a series of experiments simulating data from the standard Hidden Markov Model (HMM) framework used for speech recognition. Starting with a set of test transcriptions, we begin by simulating every step of the generative process. In each subsequent experiment, we substitute a real component for a simulated component (real state durations rather than simulating from the transition models, for example), and compare the word error rates of the resulting data, thus quantifying the relative costs of each modeling assumption. A novel sampling process allows us to test the independence assumptions of the HMM, which appear to present far more serious problems than the other data/model mismatches.

[1]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jeff A. Bilmes,et al.  Buried Markov models: a graphical-modeling approach to automatic speech recognition , 2003, Comput. Speech Lang..

[3]  Yuqing Gao,et al.  Maximum entropy direct models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[5]  Don McAllaster,et al.  STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH , 1998 .

[6]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[7]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Martin Russell,et al.  A segmental HMM for speech pattern modelling , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[11]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[12]  William Kruskal,et al.  Miracles and Statistics: The Casual Assumption of Independence , 1988 .

[13]  Geoffrey Zweig,et al.  Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .