Localised Mixtures of Experts for Mixture of Regressions

In this paper, an alternative to Mixture of Experts (ME) called localised mixture of experts is studied. It corresponds to ME where the experts are linear regressions and the gating network is a Gaussian classifier. The underlying regressors distribution can be considered to be Gaussian, so that the joint distribution is a Gaussian mixture. This provides a powerful speed-up of the EM algorithm for localised ME. Conversely, when studying Gaussian mixtures with specific constraints, one can use the standard EM algorithm for mixture of experts to carry out maximum likelihood estimation. Some constrained models are useful, and the corresponding modifications to apply to the EM algorithm are described.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  J. B. Ramsey,et al.  Estimating Mixtures of Normal Distributions and Switching Regressions , 1978 .

[3]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[4]  Christian Hennig,et al.  Models and Methods for Clusterwise Linear Regression , 1999 .

[5]  Perry D. Moerland,et al.  Classification using localized mixtures of experts , 1999 .

[6]  Juergen Fritsch,et al.  Modular Neural Networks for Speech Recognition. , 1996 .

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[9]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[10]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[11]  M. Tanner,et al.  Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum , 1999 .

[12]  Jiming Jiang,et al.  Conditional inference about generalized linear mixed models , 1999 .

[13]  C. Robert,et al.  Estimating Mixtures of Regressions , 2003 .

[14]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[15]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[16]  N. Kiefer Discrete Parameter Variation: Efficient Estimation of a Switching Regression Model , 1978 .

[17]  Christian Hennig,et al.  Identifiablity of Models for Clusterwise Linear Regression , 2000, J. Classif..

[18]  Alexander H. Waibel,et al.  Adaptively Growing Hierarchical Mixtures of Experts , 1996, NIPS.

[19]  Ralf Der,et al.  Efficient State-Space Representation by Neural Maps for Reinforcement Learning , 1999 .

[20]  R. Quandt A New Approach to Estimating Switching Regressions , 1972 .