Input feature selection for classification problems

Feature selection plays an important role in classifying systems such as neural networks (NNs). We use a set of attributes which are relevant, irrelevant or redundant and from the viewpoint of managing a dataset which can be huge, reducing the number of attributes by selecting only the relevant ones is desirable. In doing so, higher performances with lower computational effort is expected. In this paper, we propose two feature selection algorithms. The limitation of mutual information feature selector (MIFS) is analyzed and a method to overcome this limitation is studied. One of the proposed algorithms makes more considered use of mutual information between input attributes and output classes than the MIFS. What is demonstrated is that the proposed method can provide the performance of the ideal greedy selection algorithm when information is distributed uniformly. The computational load for this algorithm is nearly the same as that of MIFS. In addition, another feature selection algorithm using the Taguchi method is proposed. This is advanced as a solution to the question as to how to identify good features with as few experiments as possible. The proposed algorithms are applied to several classification problems and compared with MIFS. These two algorithms can be combined to complement each other's limitations. The combined algorithm performed well in several experiments and should prove to be a useful method in selecting features for classification problems.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[3]  Kenneth W. Bauer,et al.  Integrated feature architecture selection , 1996, IEEE Trans. Neural Networks.

[4]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[5]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[6]  J. I The Design of Experiments , 1936, Nature.

[7]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[8]  Geoffrey Holmes,et al.  Feature selection via the discovery of simple classification rules , 1995 .

[9]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Daniel C. St. Clair,et al.  Using Taguchi's method of experimental design to control errors in layered perceptrons , 1995, IEEE Trans. Neural Networks.

[12]  Henry W. Altland,et al.  Engineering Methods for Robust Product Design , 1996 .

[13]  Qi Li,et al.  Principal feature classification , 1997, IEEE Trans. Neural Networks.

[14]  Steven K. Rogers,et al.  Bayesian selection of important features for feedforward neural networks , 1993, Neurocomputing.

[15]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[16]  Shamkant B. Navathe,et al.  Knowledge mining by imprecise querying: a classification-based approach , 1992, [1992] Eighth International Conference on Data Engineering.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Kenneth W. Bauer,et al.  Determining input features for multilayer perceptrons , 1995, Neurocomputing.

[19]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[20]  I. Jolliffe Principal Component Analysis , 2002 .

[21]  Genichi Taguchi,et al.  Taguchi on Robust Technology Development , 1992 .

[22]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[23]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.