Efficient progressive sampling

Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive samplingusing progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient .

[1]  Richard E. Korf,et al.  Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Tim Oates,et al.  The Effects of Training Set Size on Decision Tree Complexity , 1997, ICML.

[4]  Stuart J. Russell,et al.  Decision Theoretic Subsampling for Induction on Large Databases , 1993, ICML.

[5]  Fenguangzhai Song CD , 1992 .

[6]  Lawrence Birnbaum,et al.  Proceedings of the eighth international workshop on Machine learning , 1991 .

[7]  Douglas H. Fisher,et al.  Modeling decision tree performance with the power law , 1999, AISTATS.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Jason Catlett,et al.  Megainduction: A Test Flight , 1991, ML.

[10]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[11]  Johannes Fürnkranz,et al.  Integrative Windowing , 1998, J. Artif. Intell. Res..

[12]  T. Watkin,et al.  THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[13]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[14]  Foster J. Provost Iterative Weakening: Optimal and Near-Optimal Policies for the Selection of Search Bias , 1993, AAAI.

[15]  Tim Oates,et al.  Large Datasets Lead to Overly Complex Models: An Explanation and a Solution , 1998, KDD.

[16]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[17]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1996 .

[18]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .