Cooperative coevolution of artificial neural network ensembles for pattern classification

This paper presents a cooperative coevolutive approach for designing neural network ensembles. Cooperative coevolution is a recent paradigm in evolutionary computation that allows the effective modeling of cooperative environments. Although theoretically, a single neural network with a sufficient number of neurons in the hidden layer would suffice to solve any problem, in practice many real-world problems are too hard to construct the appropriate network that solve them. In such problems, neural network ensembles are a successful alternative. Nevertheless, the design of neural network ensembles is a complex task. In this paper, we propose a general framework for designing neural network ensembles by means of cooperative coevolution. The proposed model has two main objectives: first, the improvement of the combination of the trained individual networks; second, the cooperative evolution of such networks, encouraging collaboration among them, instead of a separate training of each network. In order to favor the cooperation of the networks, each network is evaluated throughout the evolutionary process using a multiobjective method. For each network, different objectives are defined, considering not only its performance in the given problem, but also its cooperation with the rest of the networks. In addition, a population of ensembles is evolved, improving the combination of networks and obtaining subsets of networks to form ensembles that perform better than the combination of all the evolved networks. The proposed model is applied to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set. In all of them the performance of the model is better than the performance of standard ensembles in terms of generalization error. Moreover, the size of the obtained ensembles is also smaller.

[1]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[2]  Abdelmonem A. Afifi,et al.  Statistical Analysis: A Computer Oriented Approach. , 1973 .

[3]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[4]  Kenneth Alan De Jong,et al.  An analysis of the behavior of a class of genetic adaptive systems. , 1975 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Robert L. Winkler,et al.  Limits for the Precision and Value of Information from Dependent Sources , 1985, Oper. Res..

[8]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[9]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  Darrell Whitley,et al.  Genitor: a different genetic algorithm , 1988 .

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[14]  G. Dunteman Principal Components Analysis , 1989 .

[15]  M. Ishikawa,et al.  A structural learning algorithm with forgetting of link weights , 1989, International 1989 Joint Conference on Neural Networks.

[16]  Kalyanmoy Deb,et al.  An Investigation of Niche and Species Formation in Genetic Function Optimization , 1989, ICGA.

[17]  L. Darrell Whitley,et al.  GENITOR II: a distributed genetic algorithm , 1990, J. Exp. Theor. Artif. Intell..

[18]  Gilbert Syswerda,et al.  A Study of Reproduction in Generational and Steady State Genetic Algorithms , 1990, FOGA.

[19]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[20]  David R. Jefferson,et al.  Selection in Massively Parallel Genetic Algorithms , 1991, ICGA.

[21]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[22]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[23]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[24]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[25]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[26]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[27]  Kalyanmoy Deb,et al.  MULTI-OBJECTIVE FUNCTION OPTIMIZATION USING NON-DOMINATED SORTING GENETIC ALGORITHMS , 1994 .

[28]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[29]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[30]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[31]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[32]  Paul J. Werbos,et al.  The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .

[33]  David E. Goldberg,et al.  Implicit Niching in a Learning Classifier System: Nature's Way , 1994, Evolutionary Computation.

[34]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[35]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[36]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[37]  Ronny Meir,et al.  Bias, variance and the combination of estimators; The case of linear least squares , 1995 .

[38]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[39]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[40]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[41]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[42]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[43]  David W. Opitz,et al.  Actively Searching for an E(cid:11)ective Neural-Network Ensemble , 1996 .

[44]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[45]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[46]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[47]  Lars Kai Hansen,et al.  Regularization with a Pruning Prior , 1997, Neural Networks.

[48]  Risto Miikkulainen,et al.  Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.

[49]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[50]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[51]  Mitchell A. Potter,et al.  The design and analysis of a computational model of cooperative coevolution , 1997 .

[52]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[53]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[54]  Johannes R. Sveinsson,et al.  Parallel consensual neural networks , 1997, IEEE Trans. Neural Networks.

[55]  Xin Yao,et al.  Speciation as automatic categorical modularization , 1997, IEEE Trans. Evol. Comput..

[56]  L. Breiman Arcing Classifiers , 1998 .

[57]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[58]  Xin Yao,et al.  Making use of population information in evolutionary artificial neural networks , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[59]  D. Obradovic,et al.  Combining Artificial Neural Nets , 1999, Perspectives in Neural Computing.

[60]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[61]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[62]  Kalyanmoy Deb,et al.  Evolutionary Algorithms for Multi-Criterion Optimization in Engineering Design , 1999 .

[63]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[64]  Jakob Vogdrup Hansen,et al.  Combining Predictors: Comparison of Five Meta Machine Learning Methods , 1999, Inf. Sci..

[65]  Ling Guan,et al.  Modularity in neural computing , 1999, Proc. IEEE.

[66]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[67]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[68]  Nathan Intrator,et al.  Boosted Mixture of Experts: An Ensemble Learning Scheme , 1999, Neural Computation.

[69]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[70]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[71]  Fabio Roli,et al.  Dynamic Classifier Selection , 2000, Multiple Classifier Systems.

[72]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[73]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[74]  David W. Corne,et al.  Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy , 2000, Evolutionary Computation.

[75]  P. Husbands Evolving artificial intelligence , 2001, Trends in Cognitive Sciences.

[76]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[77]  Xin Yao,et al.  Evolving a cooperative population of neural networks by minimizing mutual information , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[78]  César Hervás-Martínez,et al.  Multi-objective cooperative coevolution of artificial neural networks (multi-objective cooperative networks) , 2002, Neural Networks.

[79]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[80]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[81]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[82]  Tom Heskes,et al.  Clustering ensembles of neural network models , 2003, Neural Networks.

[83]  Dirk Thierens,et al.  The balance between proximity and diversity in multiobjective evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[84]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[85]  César Hervás-Martínez,et al.  COVNET: a cooperative coevolutionary model for evolving artificial neural networks , 2003, IEEE Trans. Neural Networks.

[86]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[87]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[88]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[89]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[90]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[91]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[92]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[93]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[94]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[95]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[96]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[97]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[98]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[99]  Xin Yao,et al.  Negatively correlated neural networks for classification , 2006, Artificial Life and Robotics.