On naive Bayes in speech recognition

The currently dominant speech recognition technology, hidden Markov modeling, has long been criticized for its simplistic assumptions about speech, and especially for the naive Bayes combination rule inherent in it. Many sophisticated alternative models have been suggested over the last decade. These, however, have demonstrated only modest improvements and brought no paradigm shift in technology. The goal of this paper is to examine why HMM performs so well in spite of its incorrect bias due to the naive Bayes assumption. To do this we create an algorithmic framework that allows us to experiment with alternative combination schemes and helps us understand the factors that influence recognition performance. From the findings we argue that the bias peculiar to the naive Bayes rule is not really detrimental to phoneme classification performance. Furthermore, it ensures consistent behavior in outlier modeling, allowing efficient management of insertion and deletion errors.

[1]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[2]  Rish,et al.  An analysis of data characteristics that affect naive Bayes performance , 2001 .

[3]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[4]  Yifan Gong,et al.  Assessing the importance of the segmentation probability in segment-based speech recognition , 1998, Speech Commun..

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  László Tóth,et al.  A Discriminative Segmental Speech Model and Its Application to Hungarian Number Recognition , 2000, TSD.

[9]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[11]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[12]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[13]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .