A Hybrid Filter-Wrapper Approach for FeatureSelection

Feature selection is the task of selecting a small subset from original featuresthat can achieve maximum classification accuracy. This subset of features hassome very important benefits like, it reduces computational complexity of learningalgorithms, saves time, improve accuracy and the selected features can beinsightful for the people involved in problem domain. This makes feature selectionas an indispensable task in classification task.This dissertation presents a two phase approach for feature selection. In thefirst phase a filter method is used with “correlation coefficient” and “mutualinformation” as statistical measure of similarity. This phase helps in improvingthe classification performance by removing redundant and unimportantfeatures. A wrapper method is used in the second phase with the sequentialforward selection and sequential backward elimination. This phase helps in selectingrelevant feature subset that produce maximum accuracy according tothe underlying classifier. The Support Vector Machine (SVM) classifier (linearand nonlinear) is used to evaluate the classification accuracy of our approach.This empirical results of commonly used data sets from the University ofCalifornia, Irvine repository and microarray data sets showed that the proposedmethod performs better in terms of classification accuracy, number of selectedfeatures, and computational efficiency.7

[1]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[2]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[3]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[4]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  TishbyNaftali,et al.  Distributional word clusters vs. words for text categorization , 2003 .

[8]  Jan van Tiel,et al.  Convex Analysis: An Introductory Text , 1984 .

[9]  J P Lewis A Short SVM ( Support Vector Machine ) Tutorial , .

[10]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[12]  Rich Caruana,et al.  Benefitting from the Variables that Variable Selection Discards , 2003, J. Mach. Learn. Res..

[13]  Katta G. Murty,et al.  Nonlinear Programming Theory and Algorithms , 2007, Technometrics.

[14]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[15]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[16]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[17]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[18]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[20]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[21]  I. Kojadinovic,et al.  Comparison between a filter and a wrapper approach to variable subset selection in regression problems , 2000 .

[22]  Chris H. Q. Ding,et al.  Evolving Feature Selection , 2005, IEEE Intell. Syst..

[23]  S. S. Iyengar,et al.  An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering , 2005, IDA.

[24]  John Platt,et al.  Fast training of svms using sequential minimal optimization , 1998 .

[25]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[26]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[27]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[28]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[29]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  T. Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1999, ECML.

[32]  H. Hindi,et al.  A tutorial on convex optimization , 2004, Proceedings of the 2004 American Control Conference.

[33]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[34]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[35]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[36]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[37]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38]  Nicolaj Søndberg-Madsen,et al.  Unsupervised Feature Subset Selection , 2003 .

[39]  Huaqing Wang,et al.  A Feature Extraction Method Based on Information Theory for Fault Diagnosis of Reciprocating Machinery , 2009, Sensors.

[40]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[42]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[43]  Yihong Gong,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  Heng Tao Shen,et al.  Dimensionality Reduction , 2009, Encyclopedia of Database Systems.

[45]  Christian Osendorfer,et al.  Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.

[46]  Alejandro Pazos Sierra,et al.  Encyclopedia of Artificial Intelligence , 2008 .

[47]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[48]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[49]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[50]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[51]  Kai Yu,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[54]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[55]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[56]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[57]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[58]  Kurosh Madani,et al.  ANN-Based Defects' Diagnosis of Industrial A Optical Devices , 2009, Encyclopedia of Artificial Intelligence.

[59]  Oded Maimon,et al.  Dimension Reduction and Feature Selection , 2010, Data Mining and Knowledge Discovery Handbook.

[60]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[61]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[62]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[63]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[64]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[65]  Peter N. Jordan,et al.  Therapies for ventricular cardiac arrhythmias. , 2005, Critical reviews in biomedical engineering.

[66]  Ching Y. Suen,et al.  A trainable feature extractor for handwritten digit recognition , 2007, Pattern Recognit..

[67]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[68]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[69]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[70]  Pedro M. Domingos Control-Sensitive Feature Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[71]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[72]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[73]  H. A. Guvenir,et al.  A supervised machine learning algorithm for arrhythmia analysis , 1997, Computers in Cardiology 1997.

[74]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[75]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[76]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[77]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[78]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[79]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[80]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[81]  V. Maheshwari,et al.  Image-guided fine-needle aspiration cytology of ovarian tumors: An assessment of diagnostic efficacy , 2010, Journal of cytology.

[82]  Pierre Beauseroy,et al.  Mutual information-based feature extraction on the time-frequency plane , 2002, IEEE Trans. Signal Process..

[83]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[84]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[85]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.