论文信息 - Hierarchical Mixtures of Experts and the EM Algorithm

Hierarchical Mixtures of Experts and the EM Algorithm

We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

Michael I. Jordan | R. Jacobs | M. I. Jordan

[1] M. Kendall. Theoretical Statistics , 1956, Nature.

[2] David R. Cox. The analysis of binary data , 1970 .

[3] D. J. Finney. Statistical Method in Biological Assay , 1966 .

[4] R. Quandt. A New Approach to Estimating Switching Regressions , 1972 .

[5] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] Lennart Ljung,et al. Theory and Practice of Recursive Identification , 1983 .

[10] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[11] P. McCullagh,et al. Generalized Linear Models , 1984 .

[12] R. Redner,et al. Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[13] P. Kumar,et al. Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[14] A. F. Smith,et al. Statistical analysis of finite mixture distributions , 1986 .

[15] S. Haykin,et al. Adaptive Filter Theory , 1986 .

[16] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[17] D. Rubin,et al. Statistical Analysis with Missing Data. , 1989 .

[18] James Kelly,et al. AutoClass: A Bayesian Classification System , 1993, ML.

[19] John E. Moody,et al. Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[20] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[21] Steven J. Nowlan,et al. Maximum Likelihood Competitive Learning , 1989, NIPS.

[22] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[23] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24] Carla E. Brodley,et al. An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[25] R. Tibshirani,et al. Generalized Additive Models , 1991 .

[26] J. Friedman. Multivariate adaptive regression splines , 1990 .

[27] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.

[28] Donald F. Specht,et al. A general regression neural network , 1991, IEEE Trans. Neural Networks.

[29] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[30] Terence D. Sanger,et al. A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[31] J. Freidman,et al. Multivariate adaptive regression splines , 1991 .

[32] Jan-Erik Strömberg,et al. Neural trees-using neural nets in a tree classifier structure , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[33] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[34] David W. Scott,et al. Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[35] Wray L. Buntine,et al. Learning classification trees , 1992 .

[36] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[37] Timothy Masters,et al. Multilayer Feedforward Networks , 1993 .

[38] G. Wahba,et al. Soft Classiication, A. K. A. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Analysis of Variance , 1993 .

[39] Shun-ichi Amari,et al. Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[40] Fengchun Peng,et al. Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Applic , 1996 .

[41] Shun-ichi Amari,et al. The EM Algorithm and Information Geometry in Neural Network Learning , 1995, Neural Computation.

[42] Michael I. Jordan,et al. Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[43] Martin Brown,et al. Advances in neurofuzzy algorithms for real-time modelling and control , 1996 .

[44] Athanasios Kehagias,et al. Modular neural networks for MAP classification of time series and the partition algorithm , 1996, IEEE Trans. Neural Networks.

[45] Eric Mjolsness,et al. Learning with Preknowledge: Clustering with Point and Graph Matching Distance Measures , 1996, Neural Computation.

[46] Michael I. Jordan,et al. Local linear perceptrons for classification , 1996, IEEE Trans. Neural Networks.

[47] Ke Chen,et al. A modified HME architecture for text-dependent speaker identification , 1996, IEEE Trans. Neural Networks.

[48] J. T. Connor,et al. A robust neural network filter for electricity demand prediction , 1996 .

[49] Ming Zhang,et al. Face recognition using artificial neural network group-based adaptive tolerance (GAT) trees , 1996, IEEE Trans. Neural Networks.

[50] Jun Tani,et al. Model-based learning for mobile robot navigation from the dynamical systems perspective , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[51] Kenneth Rose,et al. A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[52] Steven Gold,et al. A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[53] L.M.C. Buydens,et al. Performance of multi-layer feedforward and radial base function neural networks in classification and modelling , 1996 .

[54] J. Simonoff. Multivariate Density Estimation , 1996 .

[55] Haim Sompolinsky,et al. Neural Network Models of Perceptual Learning of Angle Discrimination , 1996, Neural Computation.

[56] Kenneth Rose,et al. Hierarchical, Unsupervised Learning with Growing via Phase Transitions , 1996, Neural Computation.