Use of localized gating in mixture of experts networks

The 'mixture-of-experts (MOE)' is a popular architecture for function approximation. In the standard architecture, each expert is gated via a softmax function, and its domain of application is not very localized. This paper summarizes several recent results showing the advantages of using localized gating instead. These include a natural framework for model selection/adaptation by growing and shrinking the number of experts, modeling of non-stationary environments, improving the generalization performance and obtaining confidence intervals of network outputs. These results substantially increase the scope and power of MOE networks. Several simulation results are presented to support the theoretical arguments.

[1]  Joydeep Ghosh,et al.  Flexible modular architecture for changing environments , 1996 .

[2]  Joydeep Ghosh,et al.  Regularization and error bars for the mixture of experts network , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Jerome H. Friedman,et al.  An Overview of Predictive Learning and Function Approximation , 1994 .

[5]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[6]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994 .

[7]  Hans G. C. Tråvén,et al.  A neural network approach to statistical pattern classification by 'semiparametric' estimation of probability density functions , 1991, IEEE Trans. Neural Networks.

[8]  Joydeep Ghosh,et al.  Structural adaptation in mixture of experts , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Joydeep Ghosh,et al.  Structurally adaptive modular networks for nonstationary environments , 1999, IEEE Trans. Neural Networks.

[10]  Joydeep Ghosh,et al.  Advances in using hierarchical mixture of experts for signal classification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[12]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[13]  Joydeep Ghosh,et al.  Ridge polynomial networks , 1995, IEEE Trans. Neural Networks.

[14]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.