Selecting Input Variables Using Mutual Informationand Nonparametric Density

In learning problems where a connectionist network is trained with a nite sized training set, better generalization performance is often obtained when unneeded weights in the network are eliminated. One source of unneeded weights comes from the inclusion of input variables that provide little information about the output variables. We propose a method for identifying and eliminating these input variables. The method rst determines the relationship between input and output variables using nonparametric density estimation and then measures the relevance of input variables using the information theoretic concept of mutual information. We present results from our method on a simple toy problem and a nonlinear time series.

[1]  Philip M. Lewis,et al.  The characteristic selection problem in recognition systems , 1962, IRE Trans. Inf. Theory.

[2]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[3]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[4]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[5]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[8]  A. Atkinson Subset Selection in Regression , 1992 .

[9]  W. Härdle Applied Nonparametric Regression , 1992 .

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[12]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[13]  Andreas S. Weigend,et al.  The Future of Time Series: Learning and Understanding , 1993 .

[14]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[15]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[16]  Ashok N. Srivastava,et al.  Computing the probability density in connectionist regression , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[17]  S. Srihari Mixture Density Networks , 1994 .