Structural adaptation in mixture of experts

The "mixture of experts" framework provides a modular and flexible approach to function approximation. However, the important problem of determining the appropriate number and complexity of experts has not been fully explored. In this paper, we consider a localized form of the gating network that can perform function approximation tasks very well with only one layer of experts. Certain measures for the smooth functioning of the training algorithm to train this model are described first. We then propose two techniques to overcome the model selection problem in the mixture of experts architecture. In the first technique, we present an efficient way to grow expert networks to come up with an appropriate number of experts for a given problem. In the second approach, we start with a certain number of experts and present methods to prune experts which become less useful and also add on experts which would be more effective. Simulation results are presented which support the techniques proposed. We observe that the growing/pruning approach yields substantially better results than the standard approach even when the final network sizes are chosen to be the same.

[1]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[2]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[3]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[4]  Joydeep Ghosh,et al.  Advances in using hierarchical mixture of experts for signal classification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.