Scaling-Up Quadratic Programming Feature Selection

Domains such as vision, bioinformatics, web search and web rankings involve datasets where number of features is very large. Feature selection is commonly employed to deal with high dimensional data. Recently, Quadratic Programming Feature Selection (QPFS) has been shown to outperform many of the existing feature selection methods for a variety of datasets. In this paper, we propose a Sequential Minimal Optimization (SMO) based framework for QPFS. This helps in reducing the cubic computational time (in terms of data dimension) of the standard QPFS to quadratic time in practice. Further, our approach has significantly less memory requirement than QPFS. This memory saving can be critical for doing feature selection in high dimensions. The performance of our approach is demonstrated using three publicly available benchmark datasets from bioinformatics domain.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  Tzu-Tsung Wong,et al.  Two-stage classification methods for microarray data , 2008, Expert Syst. Appl..

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Yamuna Prasad,et al.  SVM Classifier Based Feature Selection Using GA, ACO and PSO for siRNA Design , 2010, ICSI.

[8]  R. Bekkerman Distributional Word Clusters vs , 2006 .

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter , 2005, CORES.

[11]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[12]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[13]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[14]  Jagdish J. Modi,et al.  Parallel algorithms and matrix computation , 1988 .

[15]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[16]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[17]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[18]  Hans Ulrich Simon,et al.  On the complexity of working set selection , 2007, Theor. Comput. Sci..

[19]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[20]  Philip H. S. Torr,et al.  Locally Linear Support Vector Machines , 2011, ICML.

[21]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[22]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[23]  T. Aruldoss Albert Victoire,et al.  Design of fuzzy expert system for microarray data classification using a novel Genetic Swarm Algorithm , 2012, Expert Syst. Appl..

[24]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[25]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[26]  George Forman,et al.  BNS feature scaling: an improved representation over tf-idf for svm text classification , 2008, CIKM '08.

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..