Online Feature Selection with Streaming Features

We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[4]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[5]  Gregory F. Cooper,et al.  Causal Discovery from Population-Based Infant Birth and Death Records , 1999, AAAI/IAAI.

[6]  Jing Zhou,et al.  Streaming feature selection using alpha-investing , 2005, KDD '05.

[7]  Jing Zhou,et al.  Streaming Feature Selection using IIC , 2005, AISTATS.

[8]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[9]  Mario Marchand,et al.  Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Huan Liu,et al.  Searching for interacting features in subset selection , 2009, Intell. Data Anal..

[11]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[12]  Hao Wang,et al.  Online Streaming Feature Selection , 2010, ICML.

[13]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Wen Gao,et al.  Maximal Linear Embedding for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  I. Guyon,et al.  Performance Prediction Challenge , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[16]  Hujun Bao,et al.  A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Dmitriy Fradkin,et al.  Single pass text classification by direct feature weighting , 2011, Knowledge and Information Systems.

[18]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[19]  Dean P. Foster,et al.  Feature Selection using Multiple Streams , 2010, AISTATS.

[20]  Martin J. Wainwright,et al.  High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency , 2010, J. Mach. Learn. Res..

[21]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[22]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[23]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[24]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[25]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Philip S. Yu,et al.  One-class learning and concept summarization for data streams , 2011, Knowledge and Information Systems.

[28]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[29]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[30]  Xindong Wu,et al.  Subkilometer crater discovery with boosting and transfer learning , 2011, TIST.

[31]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[32]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[33]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[34]  Nizar Bouguila,et al.  A countably infinite mixture model for clustering and feature selection , 2011, Knowledge and Information Systems.

[35]  Yindalon Aphinyanagphongs,et al.  Research Paper: A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents , 2006, J. Am. Medical Informatics Assoc..

[36]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[37]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[38]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[39]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[40]  Zhao Zhang,et al.  Locality preserving multimodal discriminative learning for supervised feature selection , 2011, Knowledge and Information Systems.

[41]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[42]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[43]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[44]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[45]  Alfredo Cuzzocrea Data warehousing and knowledge discovery from sensors and streams , 2011, Knowledge and Information Systems.

[46]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[47]  Chris H. Q. Ding,et al.  Consensus group stable feature selection , 2009, KDD.

[48]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[49]  James Theiler,et al.  Online feature selection for pixel classification , 2005, ICML.

[50]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.

[51]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[52]  E. Petricoin,et al.  High-resolution serum proteomic features for ovarian cancer detection. , 2004, Endocrine-related cancer.

[53]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..