Review of Input Variable Selection Methods for Artificial Neural Networks

The choice of input variables is a fundamental, and yet crucial consideration in identifying the optimal functional form of statistical models. The task of selecting input variables is common to the development of all statistical models, and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output. In the case of parametric, or semi-parametric empirical models, the difficulty of the input variable selection task is somewhat alleviated by the a priori assumption of the functional form of the model, which is based on some physical interpretation of the underlying system or process being modelled. However, in the case of artificial neural networks (ANNs), and other similarly data-driven statistical modelling approaches, there is no such assumption made regarding the structure of the model. Instead, the input variables are selected from the available data, and the model is developed subsequently. The difficulty of selecting input variables arises due to (i) the number of available variables, which may be very large; (ii) correlations between potential input variables, which creates redundancy; and (iii) variables that have little or no predictive power. Variable subset selection has been a longstanding issue in fields of applied statistics dealing with inference and linear regression (Miller, 1984), and the advent of ANN models has only served to create new challenges in this field. The non-linearity, inherent complexity and non-parametric nature of ANN regression make it difficult to apply many existing analytical variable selection methods. The difficulty of selecting input variables is further exacerbated during ANN development, since the task of selecting inputs is often delegated to the ANN during the learning phase of development. A popular notion is that an ANN is adequately capable of identifying redundant and noise variables during training, and that the trained network will use only the salient input variables. ANN architectures can be built with arbitrary flexibility and can be successfully trained using any combination of input variables (assuming they are good predictors). Consequently, allowances are often made for a large number of input variables, with the belief that the ability to incorporate such flexibility and redundancy creates a more robust model. Such pragmatism is perhaps symptomatic of the popularisation of ANN models through machine learning, rather than statistical learning theory. ANN models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN. 1

[1]  Holger R. Maier,et al.  Optimal division of data for neural network models in water resources applications , 2002 .

[2]  Holger R. Maier,et al.  Data splitting for artificial neural networks using SOM-based stratified sampling , 2010, Neural Networks.

[3]  Armando Freitas da Rocha,et al.  Neural Nets , 1992, Lecture Notes in Computer Science.

[4]  A. S. Weigend,et al.  Selecting Input Variables Using Mutual Information and Nonparemetric Density Estimation , 1994 .

[5]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[6]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[9]  Jude W. Shavlik,et al.  Using neural networks for data mining , 1997, Future Gener. Comput. Syst..

[10]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[11]  Holger R. Maier,et al.  Forecasting chlorine residuals in a water distribution system using a general regression neural network , 2006, Math. Comput. Model..

[12]  Holger R. Maier,et al.  Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach , 2009 .

[13]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[14]  Ashish Sharma,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 — A strategy for system predictor identification , 2000 .

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  Miguel Á. Carreira-Perpiñán,et al.  A Review of Dimension Reduction Techniques , 2009 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Jarkko Tikka,et al.  INPUT VARIABLE SELECTION METHODS FOR CONSTRUCTION OF INTERPRETABLE REGRESSION MODELS , 2008 .

[19]  G. Darbellay An estimator of the mutual information based on a criterion for independence , 1999 .

[20]  Akira Kawamura,et al.  Neural Networks for Rainfall Forecasting by Atmospheric Downscaling , 2004 .

[21]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[22]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[23]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[24]  J. B. Nixon,et al.  Investigation into the relationship between chlorine decay and water distribution parameters using data driven methods , 2006, Math. Comput. Model..

[25]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[26]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[27]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[28]  Ashish Darbari,et al.  Rule Extraction from Trained ANN: A Survey , 2000 .

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Thomas P. Trappenberg,et al.  Selecting inputs for modeling using normalized higher order statistics and independent component analysis , 2001, IEEE Trans. Neural Networks.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[33]  D. Agrafiotis,et al.  Variable selection for QSAR by artificial ant colony systems , 2002, SAR and QSAR in environmental research.

[34]  Jian-Hui Jiang,et al.  Modified Ant Colony Optimization Algorithm for Variable Selection in QSAR Modeling: QSAR Studies of Cyclooxygenase Inhibitors , 2005, J. Chem. Inf. Model..

[35]  Zvi Drezner,et al.  Model Specification Searches Using Ant Colony Optimization Algorithms , 2003 .

[36]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[37]  Greer B. Kingston Bayesian artificial neural networks in water resources engineering. , 2006 .

[38]  Holger R. Maier,et al.  Input determination for neural network models in water resources applications. Part 1—background and methodology , 2005 .

[39]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[40]  Thomas P. Trappenberg,et al.  Input variable selection: mutual information and linear mixing measures , 2006, IEEE Transactions on Knowledge and Data Engineering.

[41]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[42]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .