Extensions to Online Feature Selection Using Bagging and Boosting

Feature subset selection can be used to sieve through large volumes of data and discover the most informative subset of variables for a particular learning problem. Yet, due to memory and other resource constraints (e.g., CPU availability), many of the state-of-the-art feature subset selection methods cannot be extended to high dimensional data, or data sets with an extremely large volume of instances. In this brief, we extend online feature selection (OFS), a recently introduced approach that uses partial feature information, by developing an ensemble of online linear models to make predictions. The OFS approach employs a linear model as the base classifier, which allows the $l_{0}$ -norm of the parameter vector to be constrained to perform feature selection leading to sparse linear models. We demonstrate that the proposed ensemble model typically yields a smaller error rate than any single linear model, while maintaining the same level of sparsity and complexity at the time of testing.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[3]  Ludmila I. Kuncheva,et al.  Generating classifier outputs of fixed accuracy and diversity , 2002, Pattern Recognit. Lett..

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[6]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[8]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[9]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[12]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[13]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[14]  Gregory Ditzler,et al.  A Bootstrap Based Neyman-Pearson Test for Identifying Variable Importance , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Gavin Brown,et al.  Learn++.MF: A random subspace approach for the missing feature problem , 2010, Pattern Recognit..

[16]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[17]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .