Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic

Abstract Machine classification of acoustic waveforms as speech events is often difficult due to context dependencies. Here a vowel recognition task with multiple speakers is studied via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than can be provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases where the posteri...

[1]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[2]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[5]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[6]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[7]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[8]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[9]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[10]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[11]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[12]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[13]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[14]  A. O'Hagan,et al.  The Calculation of Posterior Distributions by Data Augmentation: Comment , 1987 .

[15]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.