A Variance Reduction Framework for Stable Feature Selection

Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.

[1]  M S Pepe,et al.  Phases of biomarker development for early detection of cancer. , 2001, Journal of the National Cancer Institute.

[2]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Qiang Yang,et al.  Feature selection in a kernel space , 2007, ICML '07.

[7]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[8]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  Caroline C. Friedel,et al.  Reliable gene signatures for microarray classification: assessment of stability and performance , 2006, Bioinform..

[14]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[15]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[16]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[17]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[18]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[19]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[20]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[21]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[22]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[23]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[24]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[25]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[26]  Rich Caruana,et al.  On Feature Selection, Bias-Variance, and Bagging , 2009, ECML/PKDD.

[27]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[28]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Chris H. Q. Ding,et al.  Consensus group stable feature selection , 2009, KDD.

[30]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[31]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Tom M. Mitchell,et al.  Learning to Decode Cognitive States from Brain Images , 2004, Machine Learning.

[33]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.