Feature selection forcing overtraining may help to improve performance

One of the main drawbacks of machine learning systems is the negative effect caused by overtraining. If the points in the dataset are perfectly fitted, the generalization performance is usually bad. We propose to take profit of overtraining, together with Feature Selection, to improve the performance of a learning system. The main idea lies in the hypothesis that when the dataset is as fitted as possible, the system is forced to use all the available variables as much as possible. Noisy and useless variables can be detected if generalization improves when the system is not allowed to use them. Forcing overtraining, noisy and useless variables should be more outstanding. In order to test this hypothesis, we performed several Feature Selection experiments using Feedforward Neural Networks. The particular Feature Selection procedure used was Sequential Backward Selection. Experimental results with several real-world problems suggest that our hypothesis seems to be well-founded. Ironically, forcing overtraining may help to achieve good performance.

[1]  J. Kittler Feature selection and extraction , 1978 .

[2]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[3]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[4]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[5]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[6]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Michael J. Pont,et al.  Improving the performance of multi-layer perceptrons where limited training data are available for some classes , 1999 .

[11]  David T Felson,et al.  Frequency of specific cancer types in dermatomyositis and polymyositis: a population-based study , 2001, The Lancet.

[12]  E. Romero,et al.  A new incremental method for function approximation using feed-forward neural networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[13]  J. Sopena,et al.  Neural networks with periodic and monotonic activation functions: a comparative study in classification problems , 1999 .