Combining locally trained neural networks by introducing a reject class

This paper presents a new strategy for building and combining a local committee when a dataset is given. Training local committees is performed in two stages: active data partitioning and recombination by introducing an additional reject class. Active data partitioning is a preprocessing step that partitions the given dataset into several similar subsets using active learning. Additional reject class in this strategy plays an important role in assigning a focused area to each individual network of the committee. For combining the outputs of each individual network, we use a kind of sum rule criteria, assuming that the outputs of the individuals are equivalent to a posteriori Bayesian probabilities. All the learning procedures are based on the active learning paradigm. Experiments are performed on the two real-world datasets from the UCI machine learning database. The results show that the active data partitioning and recombining strategy is very successful for building a local committee and the combined result outperforms other algorithms, but the combined result can be affected by the training error level /spl epsiv/.

[1]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[2]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[3]  Byoung-Tak Zhang,et al.  Active Data Partitioning for Building Mixture Models , 1998, ICONIP.

[4]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[5]  Michael I. Jordan,et al.  Local linear perceptrons for classification , 1996, IEEE Trans. Neural Networks.

[6]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[7]  Robert L. Winkler,et al.  Limits for the Precision and Value of Information from Dependent Sources , 1985, Oper. Res..

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[11]  Roderick Murray-Smith,et al.  A local model network approach to nonlinear modelling , 1994 .

[12]  Byoung-Tak Zhang,et al.  Accelerated Learning by Active Example Selection , 1994, Int. J. Neural Syst..

[13]  Ron Meir,et al.  Bias, Variance and the Combination of Least Squares Estimators , 1994, NIPS.

[14]  Alexander H. Waibel,et al.  Connectionist Architectures for Multi-Speaker Phoneme Recognition , 1989, NIPS.

[15]  Robert A. Jacobs,et al.  Bias/Variance Analyses of Mixtures-of-Experts Architectures , 1997, Neural Computation.

[16]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[17]  John A. Hertz,et al.  Exploiting Neurons with Localized Receptive Fields to Learn Chaos , 1990, Complex Syst..