A Note on Generalization Loss When Evolving Adaptive Pattern Recognition Systems

Evolutionary computing provides powerful methods for designing pattern recognition systems. This design process is typically based on finite sample data and therefore bears the risk of overfitting. This paper aims at raising the awareness of various types of overfitting and at providing guidelines for how to deal with them. We restrict our considerations to the predominant scenario in which fitness computations are based on point estimates. Three different sources of losing generalization performance when evolving learning machines, namely overfitting to training, test, and final selection data, are identified, discussed, and experimentally demonstrated. The importance of a pristine hold-out data set for the selection of the final result from the evolved candidates is highlighted. It is shown that it may be beneficial to restrict this last selection process to a subset of the evolved candidates.

[1]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[2]  David G. Stork,et al.  Evolution and Learning in Neural Networks , 1990, NIPS.

[3]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[4]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[5]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[6]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Bernhard Schölkopf,et al.  Feature Selection for Support Vector Machines Using Genetic Algorithms , 2004, Int. J. Artif. Intell. Tools.

[9]  X. Yao Evolving Artificial Neural Networks , 1999 .

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Christian Igel,et al.  Evolutionary Multi-Objective Optimisation Of Neural Networks For Face Detection , 2004, Int. J. Comput. Intell. Appl..

[13]  Christian Igel Evolutionary Kernel Learning , 2010, Encyclopedia of Machine Learning.

[14]  D. Parisi,et al.  Evolution and learning in neural networks , 2002 .

[15]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[16]  Christian Igel,et al.  Uncertainty Handling in Model Selection for Support Vector Machines , 2008, PPSN.

[17]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[18]  David J. Hand,et al.  Academic Obsessions and Classification Realities: Ignoring Practicalities in Supervised Classification , 2004 .

[19]  Leslie G. Valiant,et al.  Evolvability , 2009, JACM.

[20]  David B. Fogel,et al.  Evolutionary algorithms in theory and practice , 1997, Complex.

[21]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .