The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network
暂无分享,去创建一个
[1] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .
[2] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[3] G. Lorentz. Approximation of Functions , 1966 .
[4] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[5] D. Pollard. Convergence of stochastic processes , 1984 .
[6] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.
[7] K. Lang,et al. Learning to tell two spirals apart , 1988 .
[8] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.
[9] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.
[10] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[11] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[12] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[13] Isabelle Guyon,et al. Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.
[14] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .
[15] Eduardo D. Sontag,et al. Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..
[16] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[17] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[18] John Shawe-Taylor,et al. A Result of Vapnik with Applications , 1993, Discret. Appl. Math..
[19] Paul W. Goldberg,et al. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.
[20] N. Fisher,et al. Probability Inequalities for Sums of Bounded Random Variables , 1994 .
[21] Marek Karpinski,et al. Polynomial bounds for VC dimension of sigmoidal neural networks , 1995, STOC '95.
[22] Philip M. Long,et al. A Generalization of Sauer's Lemma , 1995, J. Comb. Theory A.
[23] Philip M. Long,et al. Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..
[24] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[25] Gábor Lugosi,et al. A data-dependent skeleton estimate for learning , 1996, COLT '96.
[26] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[27] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[28] John Shawe-Taylor,et al. A framework for structural risk minimisation , 1996, COLT '96.
[29] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[30] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.
[31] David L. Neuhoff,et al. Asymptotic distribution of the errors in scalar and vector quantizers , 1996, IEEE Trans. Inf. Theory.
[32] Sanjeev R. Kulkarni,et al. Covering numbers for real-valued function classes , 1997, IEEE Trans. Inf. Theory.
[33] G. Lugosi,et al. A Data-Dependent Skeleton Estimate and a Scale-Sensitive Dimension for Classification , 1997 .
[34] Leonid Gurvits,et al. Approximation and Learning of Convex Superpositions , 1997, J. Comput. Syst. Sci..
[35] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[36] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[37] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.
[38] Axthonv G. Oettinger,et al. IEEE Transactions on Information Theory , 1998 .
[39] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.
[40] C. Lee Giles,et al. What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .
[41] Philip M. Long,et al. Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , 1998, J. Comput. Syst. Sci..
[42] Peter L. Bartlett,et al. Function Learning from Interpolation , 1995, Combinatorics, Probability and Computing.