Improved bounds on the sample complexity of learning

We present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  C. L. Mallows An inequality involving multinomial probabilities , 1968 .

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[9]  Béla Bollobás,et al.  Random Graphs , 1985 .

[10]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[11]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[12]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[13]  Manfred K. Warmuth,et al.  On the Computational Complexity of Approximating Distributions by Probabilistic Automata , 1990, COLT '90.

[14]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[17]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[18]  Srinivasan Keshav,et al.  An empirical evaluation of virtual circuit holding times in IP-over-ATM networks , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[19]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[20]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[21]  D. Karger,et al.  Random sampling in graph optimization problems , 1995 .

[22]  Oded Goldreich,et al.  A Sample of Samplers - A Computational Perspective on Sampling (survey) , 1997, Electron. Colloquium Comput. Complex..

[23]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[24]  Yoram Singer,et al.  BoosTexter: A System for Multiclass Multi-label Text Categorization , 1998 .

[25]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[26]  Philip M. Long,et al.  Adaptive Disk Spindown via Optimal Rent-to-Buy in Probabilistic Environments , 1999, Algorithmica.