Tutorial on Practical Prediction Theory for Classification

We discuss basic prediction theory and its impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and quantitatively useful.There are two important implications of the results presented here. The first is that common practices for reporting results in classification should change to use the test set bound. The second is that train set bounds can sometimes be used to directly motivate learning algorithms.

[1]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[2]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  Temple F. Smith Occam's razor , 1980, Nature.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[10]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[11]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[12]  Thore Graepel,et al.  Large Scale Bayes Point Machines , 2000, NIPS.

[13]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[14]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[15]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[16]  John Langford,et al.  Combining Trainig Set and Test Set Bounds , 2002, ICML.

[17]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[18]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[19]  John Langford,et al.  Computable Shell Decomposition Bounds , 2000, J. Mach. Learn. Res..

[20]  John Langford,et al.  Microchoice Bounds and Self Bounding Learning Algorithms , 2003, Machine Learning.

[21]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.