Feature selection using mutual information and neural networks

Reducing the dimensionality of the raw input variable space is an important step in pattern recognition and functional approximation tasks often determined by practical feasibility. The purpose of this study was to investigate an information theoretic approach to feature selection. We will use mutual information (MI) as a pre-processing step for artificial neural networks. The reasons why mutual information is not in wider use currently (except between two scalar variables) lie in computational difficulties. The probability density functions of the variables are required, and MI involves numerical integration of functions of those, which leads to a high computational complexity. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal redundancy maximal relevance criterion, for first order incremental feature selection. The feature selection methodology is hybridized with three different classification and universal function approximation paradigms: Multilayer Perceptron, Radial Basis Function and Support Vector Machine. We perform extensive experimental comparison of the proposed hybrid algorithm for different problem: breast cancer classification, diabetes in Pima Indians and arrhythmia