Neural Networks -NN- have been used in a large variety of real-world applications. In those, one could measure a potentially large number N of variables Xi; probably not all Xi are equally informative: if one could select n« N “best” variables Xi, then one could reduce the amount of data to gather and process; hence reduce costs. Variable selection is thus an important issue in Pattern Recognition and Regression. It is also a complex problem; one needs a criterion to measure the value of a subset of variables and that value will of course depend on the predictor or classifier further used. Conventional variable selection techniques are based upon statistical or heuristics tools [Fukunaga, 90]: the major difficulty comes from the intrinsic combinatorics of the problem. In this paper we show how to use NNs for variable selection with a criterion based upon the evaluation of a variable usefulness. Various methods have been proposed to assess the value of a weight (e.g. saliency [Le Cun et al. 90] in the Optimal Brain-Damage -OBD- procedure): along similar ideas, we derive a method, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful. Variable selection is thus achieved during training of the classifier, ensuring that the selected set of variables matches the classifier complexity. Variable selection is thus viewed here as an extension of weight pruning. One can also use a regularization approach to variable selection, which we will discuss elsewhere [Cibas et al., 94]. We illustrate our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84], representative of relatively hard problems.
[1]
Babak Hassibi,et al.
Second Order Derivatives for Network Pruning: Optimal Brain Surgeon
,
1992,
NIPS.
[2]
David E. Rumelhart,et al.
Generalization by Weight-Elimination with Application to Forecasting
,
1990,
NIPS.
[3]
P. Gallinari,et al.
Cooperation of neural nets and task decomposition
,
1991,
IJCNN-91-Seattle International Joint Conference on Neural Networks.
[4]
Keinosuke Fukunaga,et al.
Statistical Pattern Recognition
,
1993,
Handbook of Pattern Recognition and Computer Vision.
[5]
Chris Bishop,et al.
Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
,
1992,
Neural Computation.
[6]
Yves Chauvin.
Dynamic Behavior of Constained Back-Propagation Networks
,
1989,
NIPS.
[7]
Yann LeCun,et al.
Optimal Brain Damage
,
1989,
NIPS.
[8]
Patrick Gallinari,et al.
Variable selection with neural networks
,
1996,
Neurocomputing.