A class-specific ensemble feature selection approach for classification problems

Due to substantial increases in data acquisition and storage, data pre-processing techniques such as feature selection have become increasingly popular in classification tasks. This research proposes a new feature selection algorithm, Class-specific Ensemble Feature Selection (CEFS), which finds class-specific subsets of features optimal to each available classification in the dataset. Each subset is then combined with a classifier to create an ensemble feature selection model which is further used to predict unseen instances. CEFS attempts to provide the diversity and base classifier disagreement sought after in effective ensemble models by providing highly useful, yet highly exclusive feature subsets. Also, the use of a wrapper method gives each subset the chance to perform optimally under the respective base classifier. Preliminary experiments implementing this innovative approach suggest potential improvements of more than 10% over existing methods.

[1]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[2]  Justin Doak,et al.  CSE-92-18 - An Evaluation of Feature Selection Methodsand Their Application to Computer Security , 1992 .

[3]  David W. Aha,et al.  Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison , 1994 .

[4]  Antonio Ara,et al.  Empirical Study of Feature Selection Methods in Classification , 2008 .

[5]  Gerald J. Hahn,et al.  Applied Regression Analysis (2nd Ed.) , 2012 .

[6]  Guangyuan Liu,et al.  New Feature Selection Algorithm based on Potential Difference , 2007, 2007 International Conference on Mechatronics and Automation.

[7]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[8]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[9]  Manoranjan Dash,et al.  Feature Selection for Clustering , 2009, Encyclopedia of Database Systems.

[10]  Xiaochun Yun,et al.  Optimizing Traffic Classification Using Hybrid Feature Selection , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[11]  A. Meyer-Bäse Feature Selection and Extraction , 2004 .

[12]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[13]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[14]  L. J. Cao,et al.  Feature extraction in support vector machine: a comparison of PCA, XPCA and ICA , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[15]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[18]  Carla E. Brodley,et al.  Feature Subset Selection and Order Identification for Unsupervised Learning , 2000, ICML.

[19]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[20]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[21]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[22]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[23]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[24]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  Anne M. P. Canuto,et al.  A Comparative Analysis of Feature Selection Methods for Ensembles with Different Combination Methods , 2007, 2007 International Joint Conference on Neural Networks.

[29]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[30]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Richard Maclin,et al.  Ensembles as a Sequence of Classifiers , 1997, IJCAI.

[32]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[33]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[34]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[35]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[36]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[37]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[38]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[39]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[40]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[41]  Anne M. P. Canuto,et al.  A Class-Based Feature Selection Method for Ensemble Systems , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[42]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[43]  Mykola Pechenizkiy,et al.  Search strategies for ensemble feature selection in medical diagnostics , 2003, 16th IEEE Symposium Computer-Based Medical Systems, 2003. Proceedings..

[44]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[45]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[46]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[47]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[48]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[49]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[50]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[51]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[52]  Melanie Hilario,et al.  Stability of feature selection algorithms , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[53]  Filippo Menczer,et al.  An evolutionary multi-objective local selection algorithm for customer targeting , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[54]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[55]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[56]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  A. Atkinson Subset Selection in Regression , 1992 .

[58]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[59]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[60]  Rich Caruana,et al.  How Useful Is Relevance , 1994 .

[61]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[62]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[63]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[64]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[65]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[66]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[67]  Qiong Chen,et al.  Feature Selection for the Topic-Based Mixture Model in Factored Classification , 2006, 2006 International Conference on Computational Intelligence and Security.

[68]  Han Lu,et al.  The effects of domain knowledge relations on domain text classification , 2008, 2008 27th Chinese Control Conference.

[69]  Alexey Tsymbal,et al.  Ensemble feature selection with the simple Bayesian classification , 2003, Inf. Fusion.

[70]  Evgeniy Gabrilovich,et al.  Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.

[71]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[72]  José Manuel Benítez,et al.  Empirical Study of Feature Selection Methods in Classification , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[73]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[74]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[75]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[76]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[77]  L. Darrell Whitley,et al.  Genetic Approach to Feature Selection for Ensemble Creation , 1999, GECCO.

[78]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[79]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[80]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[81]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[82]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[83]  Moshe Koppel Sean P. Engelson Integrating Multiple Classifiers By Finding Their Areas of Expertise , 1996 .

[84]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[85]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[86]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[87]  Hui-Huang Hsu,et al.  A Hybrid Feature Selection Mechanism , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[88]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[89]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[90]  Darrell Whitley,et al.  Feature Selection Mechanisms for Ensemble Creation : A Genetic Search Perspective , 2003 .

[91]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[92]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[93]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[94]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[95]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[96]  Gregory Piatetsky-Shapiro,et al.  Knowledge discovery in databases: 10 years after , 2000, SKDD.

[97]  Nor Hayati Othman,et al.  A review of feature selection techniques via gene expression profiles , 2008, 2008 International Symposium on Information Technology.

[98]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[99]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[100]  Franco Turini,et al.  DrC4.5: Improving C4.5 by means of prior knowledge , 2005, SAC '05.