Boosting on a Budget: Sampling for Feature-Efficient Prediction

In this paper, we tackle the problem of feature-efficient prediction: classification using a limited number of features per test example. We show that modifying an ensemble classifier such as AdaBoost, by sampling hypotheses from its final weighted predictor, is well-suited for this task. We further consider an extension of this problem, where the costs of examining the various features can differ from one another, and we give an algorithm for this more general setting. We prove the correctness of our algorithms and derive bounds for the number of samples needed for given error rates. We also experimentally verify the effectiveness of our methods.

[1]  Lev Reyzin,et al.  Boosting on a Feature Budget , 2010 .

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Ohad Shamir,et al.  Efficient Learning with Partially Observed Attributes , 2010, ICML.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Dan Roth,et al.  Learning cost-sensitive active classifiers , 2002, Artif. Intell..

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[8]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[9]  Zhi-Hua Zhou,et al.  On the Margin Explanation of Boosting Algorithms , 2008, COLT.

[10]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[11]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[14]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[15]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[16]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[17]  John Langford,et al.  An Improved Predictive Accuracy Bound for Averaging Classifiers , 2001, ICML.

[18]  Shai Ben-David,et al.  Learning with restricted focus of attention , 1993, COLT '93.

[19]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[20]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.