论文信息 - Feature Selection Filters Based on the Permutation Test

Feature Selection Filters Based on the Permutation Test

We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X|Y = 0) and p(X|Y = 1), are identical. To estimate the p-values, we use Fisher's permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large.

[1] Michael I. Jordan,et al. Simultaneous Relevant Feature Identification and Classification in High-Dimensional Spaces , 2002, WABI.

[2] Jinyan Li,et al. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns , 2002, Bioinform..

[3] D. Michie. Personal models of rationality , 1990 .

[4] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5] Ron Kohavi,et al. Wrappers for feature selection , 1997 .

[6] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[7] Huan Liu,et al. Feature Selection for Classification , 1997, Intell. Data Anal..

[8] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9] Brian R. Gaines. An Ounce of Knowledge is Worth a Ton of Data: Quantitative studies of the Trade-Off between Expertise and Data Based On Statistically Well-Founded Empirical Induction , 1989, ML.

[10] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[11] Rich Caruana,et al. Greedy Attribute Selection , 1994, ICML.

[12] Ian H. Witten,et al. Using a Permutation Test for Attribute Selection in Decision Trees , 1998, ICML.

[13] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[14] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[15] Wei Zhong Liu,et al. Bias in information-based measures in decision tree induction , 1994, Machine Learning.

[16] George Forman,et al. An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[17] Allan P. White,et al. Technical Note: Bias in Information-Based Measures in Decision Tree Induction , 1994, Machine Learning.

[18] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[19] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20] Geoff Holmes,et al. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[21] Stan Matwin,et al. Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[22] Igor Kononenko,et al. On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[23] Thomas G. Dietterich,et al. Learning with Many Irrelevant Features , 1991, AAAI.

[24] Justin Doak,et al. An evaluation of feature selection methods and their application to computer security , 1992 .

[25] Thomas G. Dietterich. Overfitting and undercomputing in machine learning , 1995, CSUR.

[26] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[27] Pat Langley,et al. An Analysis of Bayesian Classifiers , 1992, AAAI.

[28] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[29] J. Ross Quinlan,et al. Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[30] Paul W. Baim. A Method for Attribute Selection in Inductive Learning Systems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[31] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[32] J. Kent Martin,et al. An Exact Probability Metric for Decision Tree Splitting and Stopping , 1997, Machine Learning.

[33] Larry A. Rendell,et al. The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[34] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[35] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[36] Justin Doak,et al. CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[37] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.

[38] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[39] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[40] Michael I. Jordan,et al. Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data , 2003, Signal Process..