Comparison of Stability for Different Families of Filter-Based and Wrapper-Based Feature Selection

Due to the prevalence of high dimensionality(having a large number of independent attributes), feature selection techniques (which reduce the feature subset to amore manageable size) have become quite popular. These reduced feature subsets can help improve the performance of classification models and can also inform researchers about which features are most relevant for the problem at hand. For this latter problem, it is often most important that the features chosen are consistent even in the face of changes(perturbations) to the dataset. While previous studies have considered the problem of finding so-called "stable" feature selection techniques, none has examined stability across all three major categories of feature selection technique: filter-based feature rankers (which use statistical measures to assign scores to each feature), filter-based subset evaluators (which also employ statistical approaches, but consider whole feature subsets at a time), and wrapper-based subset evaluation (which also considers whole subsets, but which builds classification models to evaluate these subsets). In the present study, we use two datasets from the domain of Twitter profile mining to compare the stability of five filter-based rankers, two filter-based subset evaluators, and five wrapper-based subset evaluators. We find that the rankers are most stable, followed by the filter-based subset evaluators, with the wrappers being the least stable. We also show that the relative performance among the techniques within each group is consistent across dataset and perturbation level. However, the relative stability of the two datasets does vary between the groups, showing that the effects are more complex than simply "one group is always more stable than another group".

[1]  Gregory J. Park,et al.  Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[3]  Jana Novovicová,et al.  Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[6]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  P. Cunningham,et al.  Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection , 2002 .

[8]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[9]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[10]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[11]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[12]  Shyam Visweswaran,et al.  Measuring Stability of Feature Selection in Biomedical Datasets , 2009, AMIA.

[13]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[14]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[15]  Taghi M. Khoshgoftaar,et al.  First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques , 2012, 2012 11th International Conference on Machine Learning and Applications.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[18]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[19]  Markus Strohmaier,et al.  When Social Bots Attack: Modeling Susceptibility of Users in Online Social Networks , 2012, #MSM.

[20]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[21]  Taghi M. Khoshgoftaar,et al.  Comparative Analysis of DNA Microarray Data through the Use of Feature Selection Techniques , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[22]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[23]  Huan Liu,et al.  Consistency Based Feature Selection , 2000, PAKDD.

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Taghi M. Khoshgoftaar,et al.  A Study on the Relationships of Classifier Performance Metrics , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[26]  Taghi M. Khoshgoftaar,et al.  A comparative evaluation of feature ranking methods for high dimensional bioinformatics data , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[27]  Taghi M. Khoshgoftaar,et al.  Using Twitter Content to Predict Psychopathy , 2012, 2012 11th International Conference on Machine Learning and Applications.

[28]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.