Feature Selection: A literature Review

Relevant feature identification has become an essential task to apply data mining algorithms effectively in real-world scenarios. Therefore, many feature selection methods have been proposed to obtain the relevant feature or feature subsets in the literature to achieve their objectives of classification and clustering. This paper introduces the concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection. A comprehensive overview, categorization, and comparison of existing feature selection methods are also done, and the guidelines are also provided for user to select a feature selection algorithm without knowing the information of each algorithm. We conclude this work with real world applications, challenges, and future research directions of feature selection.

[1]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[4]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[5]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[6]  Ashish Ghosh,et al.  Self-adaptive differential evolution for feature selection in hyperspectral image data , 2013, Appl. Soft Comput..

[7]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[8]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[9]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[10]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[11]  Sonajharia Minz,et al.  Multi-view Ensemble Learning for Poem Data Classification Using SentiWordNet , 2014 .

[12]  Michal Valko,et al.  Feature Selection and Dimensionality Reduction in Genomics and Proteomics , 2007 .

[13]  Vipin Kumar,et al.  Poem Classification Using Machine Learning Approach , 2012, SocProS.

[14]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[15]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[16]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[17]  Lei Liu,et al.  Feature Selection Using Mutual Information: An Experimental Study , 2008, PRICAI.

[18]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[19]  Foster Provost,et al.  Distributed Data Mining: Scaling up and beyond , 2000 .

[20]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[21]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[22]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Bor-Chen Kuo,et al.  Feature Mining for Hyperspectral Image Classification , 2013, Proceedings of the IEEE.

[24]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Jack Sklansky,et al.  Feature Selection for Automatic Classification of Non-Gaussian Data , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[27]  Wayne Niblack,et al.  A modeling approach to feature selection , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[28]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[29]  Tuomas Eerola,et al.  Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[31]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[32]  Huan Liu,et al.  Feature Selection with Selective Sampling , 2002, International Conference on Machine Learning.

[33]  Naftali Tishby,et al.  Discriminative Feature Selection via Multiclass Variable Memory Markov Model , 2002, EURASIP J. Adv. Signal Process..

[34]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[35]  Pavel Pudil,et al.  Novel Methods for Subset Selection with Respect to Problem Knowledge , 1998, IEEE Intell. Syst..

[36]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[37]  Anthony N. Mucciardi,et al.  A Comparison of Seven Techniques for Choosing Subsets of Pattern Recognition Properties , 1971, IEEE Transactions on Computers.

[38]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[39]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[40]  M. Dash,et al.  Feature selection via set cover , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[41]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[42]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[43]  Claudio De Stefano,et al.  A GA-Based Feature Selection Algorithm for Remote Sensing Images , 2008, EvoWorkshops.

[44]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[45]  Chris H. Q. Ding,et al.  Consensus group stable feature selection , 2009, KDD.

[46]  Huan Liu,et al.  Customer Retention via Data Mining , 2000, Artificial Intelligence Review.

[47]  Sinisa Todorovic,et al.  A Feature Selection Algorithm Capable of Handling Extremely Large Data Dimensionality , 2008, SDM.

[48]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[49]  David G. Stork,et al.  Pattern Classification , 1973 .

[50]  Tanvir Ahmed,et al.  Movie popularity classification based on inherent movie attributes using C4.5, PART and correlation coefficient , 2012, 2012 International Conference on Informatics, Electronics & Vision (ICIEV).

[51]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[52]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[53]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[54]  Gilles Brassard,et al.  Fundamentals of algorithms , 1996 .

[55]  Hao Wu,et al.  An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine , 2011, Knowl. Based Syst..

[56]  Alan J. Miller Subset Selection in Regression , 1992 .

[57]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[59]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[60]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[61]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[62]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[63]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[64]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Lorenzo Beretta,et al.  Implementing ReliefF filters to extract meaningful features from genetic lifetime datasets , 2011, J. Biomed. Informatics.

[66]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[67]  Bor-Chen Kuo,et al.  Nonparametric weighted feature extraction for classification , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[68]  Hui Wang,et al.  Relevance Approach to Feature Subset Selection , 1998 .

[69]  Theodoros Lappas,et al.  Data Mining Techniques for ( Network ) Intrusion Detection Systems , 2007 .

[70]  Manoranjan Dash,et al.  Dimensionality reduction of unsupervised data , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[71]  Boris Chidlovskii,et al.  Scalable Feature Selection for Multi-class Problems , 2008, ECML/PKDD.

[72]  Verónica Bolón-Canedo,et al.  An ensemble of filters and classifiers for microarray data classification , 2012, Pattern Recognit..

[73]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[74]  Marie-Francine Moens,et al.  Highly discriminative statistical features for email classification , 2012, Knowledge and Information Systems.

[75]  Heba Abusamra,et al.  A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma , 2013 .

[76]  Evgeniy Gabrilovich,et al.  Concept-Based Feature Generation and Selection for Information Retrieval , 2008, AAAI.

[77]  Sarit Kraus,et al.  Obtaining scalable and accurate classification in large-scale spatio-temporal domains , 2011, Knowledge and Information Systems.

[78]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[79]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[80]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[81]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[82]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[84]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[85]  Paul E. Utgoff,et al.  Randomized Variable Elimination , 2002, J. Mach. Learn. Res..

[86]  M. Zeldin Heuristics! , 2010 .

[87]  Mineichi Kudo,et al.  A comparative evaluation of medium- and large-scale feature selectors for pattern classifiers , 1998, Kybernetika.

[88]  S. Minz,et al.  Mood classifiaction of lyrics using SentiWordNet , 2013, 2013 International Conference on Computer Communication and Informatics.

[89]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[90]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[91]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[92]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[93]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[94]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[95]  MANABU ICHINO,et al.  Optimum feature selection by zero-one integer programming , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[96]  Lei Zhang,et al.  Band-Subset-Based Clustering and Fusion for Hyperspectral Imagery Classification , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[97]  Leon Bobrowski Feature selection based on some homogeneity coefficient , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[98]  Alberto L. Sangiovanni-Vincentelli,et al.  Constructive Induction Using a Non-Greedy Strategy for Feature Selection , 1992, ML.

[99]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[100]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[101]  J. Friedman Clustering objects on subsets of attributes , 2002 .

[102]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[103]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[104]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[105]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[106]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[107]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[108]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[109]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[110]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[111]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[112]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[113]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[114]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.