Loss-Proportional Subsampling for Subsequent ERM

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.

[1]  Sören Sonnenburg,et al.  COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[4]  Yoshua Bengio,et al.  Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[5]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[6]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[7]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[8]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[9]  Nuno Vasconcelos,et al.  Boosting Classifier Cascades , 2010, NIPS.

[10]  Joseph K. Bradley,et al.  FilterBoost: Regression and Classification on Large Datasets , 2007, NIPS.

[11]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[12]  Sanjeev Khudanpur,et al.  Efficient Subsampling for Training Complex Language Models , 2011, EMNLP.

[13]  John Langford Vowpal Wabbit , 2014 .

[14]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[15]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[16]  Yichuan Zhang,et al.  Advances in Neural Information Processing Systems 25 , 2012 .

[17]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.