Bayesian Networks for Discrete Observation Distributions in Speech Recognition

Traditionally, in speech recognition, the hidden Markov model state emission probability distributions are usually associated to continuous random variables, by using Gaussian mixtures. Thus, complex multimodal inter-feature dependencies are not accurately modeled by Gaussian models, since they are unimodal distributions and mixtures of Gaussians are needed in these complex cases, but this is done in a loose and inefficient way. Graphical models provide a precise and simple mechanism to model the dependencies among two or more variables. This paper proposes the use of discrete random variables as observations and graphical models to extract the internal dependence structure in the feature vectors. Therefore, speech features are quantized to a small number of levels, in order to obtain a tractable model. These quantized speech features provide a mechanism to increase the robustness against noise uncertainty. In addition, discrete random variables allow the learning of joint statistics of the observation densities. A method to estimate a graphical model with a constrained number of dependencies is shown in this paper, being a special kind of Bayesian network. Experimental results show that by using this modeling, better performance can be obtained compared to standard baseline systems.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[3]  Jeff A. Bilmes,et al.  Dialog act tagging using graphical models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Antonio J. Rubio,et al.  Feature extraction from time-frequency matrices for robust speech recognition , 2001, INTERSPEECH.

[5]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[6]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[7]  Vassilios Digalakis,et al.  Reviving discrete HMMs: the myth about the superiority of continuous HMMs , 1999, EUROSPEECH.

[8]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[9]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[11]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMs , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[15]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[16]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[18]  Simon King,et al.  Articulatory feature recognition using dynamic Bayesian networks , 2007, Comput. Speech Lang..

[19]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[20]  Simon King,et al.  Sparse Gaussian graphical models for speech recognition , 2007, INTERSPEECH.

[21]  Eduardo Lleida,et al.  Graphical models for discrete hidden Markov models in speech recognition , 2009, INTERSPEECH.

[22]  Alfons Juan-Císcar,et al.  Bernoulli HMMs for Off-line Handwriting Recognition , 2008, PRIS.

[23]  Jeff A. Bilmes,et al.  Factored sparse inverse covariance matrices , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[24]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[25]  Michael G. Madden,et al.  On the classification performance of TAN and general Bayesian networks , 2008, Knowl. Based Syst..

[26]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[27]  Eduardo Lleida,et al.  Local projections and support vector based feature selection in speech recognition , 2009, INTERSPEECH.

[28]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMS , 2000, Comput. Speech Lang..

[29]  Yifan Gong,et al.  A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[31]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[32]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[33]  Andrej Ljolje The importance of cepstral parameter correlations in speech recognition , 1994, Comput. Speech Lang..

[34]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[35]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[36]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Hiroshi Kanazawa,et al.  A flexible method of creating HMM using block-diagonalization of covariance matrices , 1998, ICSLP.

[39]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[41]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[42]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[43]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[44]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[45]  Michael A. West,et al.  Covariance decomposition in undirected Gaussian graphical models , 2005 .

[46]  Alfons Juan-Císcar,et al.  Bernoulli mixture models for binary images , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[47]  Zoubin Ghahramani,et al.  Towards semi-supervised classification with Markov random fields , 2002 .

[48]  Jeff A. Bilmes,et al.  Natural statistical models for automatic speech recognition , 1999 .

[49]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[50]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[51]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[52]  Gérard Chollet,et al.  Markov Random Field Modeling for Speech Recognition , 1998 .

[53]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[54]  Josiane Zerubia,et al.  Estimation of Markov random field prior parameters using Markov chain Monte Carlo maximum likelihood , 1999, IEEE Trans. Image Process..

[55]  Kamel Smaïli,et al.  Language Modeling Using Dynamic Bayesian Networks , 2004, LREC.