Streaming feature selection using alpha-investing

In Streaming Feature Selection (SFS), new features are sequentially considered for addition to a predictive model. When the space of potential features is large, SFS offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. We describe α-investing, an adaptive complexity penalty method for SFS which dynamically adjusts the threshold on the error reduction required for adding a new feature. α-investing gives false discovery rate-style guarantees against overfitting. It differs from standard penalty methods such as AIC, BIC or RIC, which always drastically over- or under-fit in the limit of infinite numbers of non-predictive features. Empirical results show that SFS is competitive with much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with over a million potential features.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  A. Shiryaev,et al.  Limit Theorems for Stochastic Processes , 1987 .

[5]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[6]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[9]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[10]  E. George The Variable Selection Problem , 2000 .

[11]  Dean P. Foster,et al.  Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy , 2001 .

[12]  Luc De Raedt,et al.  Multirelational data mining 2003: workshop report , 2003, SKDD.

[13]  Lyle H. Ungar,et al.  Structural Logistic Regression for Link Analysis , 2003 .

[14]  Lyle H. Ungar,et al.  Cluster-based concept invention for statistical relational learning , 2004, KDD.

[15]  R. Stine Model Selection Using Information Theory and the MDL Principle , 2004 .

[16]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[17]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[18]  Jing Zhou,et al.  Streaming Feature Selection using IIC , 2005, AISTATS.