A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection

Increasingly, many machine learning applications are now associated with very large data sets whose sizes were almost unimaginable just a short time ago. As a result, many of the current algorithms cannot handle, or do not scale to, today’s extremely large volumes of data. Fortunately, not all features that make up a typical data set carry information that is relevant or useful for prediction, and identifying and removing such irrelevant features can significantly reduce the total data size. The unfortunate dilemma, however, is that some of the current data sets are so large that common feature selection algorithms—whose very goal is to reduce the dimensionality—cannot handle such large data sets, creating a vicious cycle. We describe a sequential learning framework for feature subset selection (SLSS) that can scale with both the number of features and the number of observations. The proposed framework uses multiarm bandit algorithms to sequentially search a subset of variables, and assign a level of importance for each feature. The novel contribution of SLSS is its ability to naturally scale to large data sets, evaluate such data in a very small amount of time, and be performed independently of the optimization of any classifier to reduce unnecessary complexity. We demonstrate the capabilities of SLSS on synthetic and real-world data sets.

[1]  Jian Pei,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[2]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[3]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[5]  Gregory Ditzler,et al.  Information theoretic feature selection for high dimensional metagenomic data , 2012, Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS).

[6]  Jieping Ye,et al.  Safe Screening With Variational Inequalities and Its Applicaiton to LASSO , 2013, ICML.

[7]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[8]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[10]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[11]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[12]  Eun-Jin Park,et al.  Strict vegetarian diet improves the risk factors associated with metabolic diseases by modulating gut microbiota and reducing intestinal inflammation. , 2013, Environmental microbiology reports.

[13]  Michael I. Jordan On statistics, computation and scalability , 2013, ArXiv.

[14]  Shyam Visweswaran,et al.  Measuring Stability of Feature Selection in Biomedical Datasets , 2009, AMIA.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[17]  Blaise Hanczar,et al.  Analysis of feature selection stability on high dimension and small sample data , 2014, Comput. Stat. Data Anal..

[18]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[19]  Taghi M. Khoshgoftaar,et al.  A survey of stability analysis of feature subset selection techniques , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[20]  Tze-Yun Leong,et al.  Online Feature Selection for Model-based Reinforcement Learning , 2013, ICML.

[21]  Pedro M. Domingos,et al.  Recursive Decomposition for Nonconvex Optimization - IJCAI-15 Distinguished Paper , 2015, IJCAI.

[22]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[23]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[24]  Manik Varma,et al.  On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection , 2014, ICML.

[25]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[26]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[27]  Majid Nili Ahmadabadi,et al.  Bandit-based local feature subset selection , 2014, Neurocomputing.

[28]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[29]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30]  Gavin Brown,et al.  Measuring the Stability of Feature Selection with Applications to Ensemble Methods , 2015, MCS.

[31]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[32]  Tomasz Winiarski,et al.  Feature selection based on information theory, consistency and separability indices , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[33]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[34]  Gregory Ditzler,et al.  Forensic identification with environmental samples , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[36]  Gregory Ditzler,et al.  A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[37]  L. Bottou,et al.  1 Support Vector Machine Solvers , 2007 .

[38]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[39]  Verónica Bolón-Canedo,et al.  A distributed wrapper approach for feature selection , 2013, ESANN.

[40]  R. Knight,et al.  Moving pictures of the human microbiome , 2011, Genome Biology.

[41]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[42]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[43]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[44]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[45]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Samuel H. Huang Supervised feature selection: A tutorial , 2015, Artif. Intell. Res..

[48]  Purnamrita Sarkar,et al.  The Big Data Bootstrap , 2012, ICML.

[49]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[50]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[51]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[52]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[53]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[54]  Iztok Grabnar,et al.  Association of dietary type with fecal microbiota in vegetarians and omnivores in Slovenia , 2014, European Journal of Nutrition.

[55]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[56]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[57]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[58]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..