论文信息 - Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants

Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants

There are a number of mathematical approaches to the study of learning and generalization in arti cial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are brie y discussed for the basic PAC model. Throughout, the importance of the Vapnik-Chervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binary-valued output and for networks with real-valued output.

Martin Anthony | M. Anthony

[1] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory A.

[4] S. Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[5] R. Dudley. Central Limit Theorems for Empirical Measures , 1978 .

[6] David S. Johnson,et al. Computers and Inrracrobiliry: A Guide ro the Theory of NP-Completeness , 1979 .

[7] D. Pollard. Convergence of stochastic processes , 1984 .

[8] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[9] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[10] Leslie G. Valiant,et al. On the learnability of Boolean formulae , 1987, STOC.

[11] J. Stephen Judd,et al. Learning in neural networks , 1988, COLT '88.

[12] Leslie G. Valiant,et al. Computational limitations on learning from examples , 1988, JACM.

[13] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[14] Alon Itai,et al. Learnability by fixed distributions , 1988, COLT '88.

[15] David Haussler,et al. Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[16] David Haussler,et al. Equivalence of models for polynomial learnability , 1988, COLT '88.

[17] Ming Li,et al. A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution , 1989, 30th Annual Symposium on Foundations of Computer Science.

[18] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[19] Paul M. B. Vitányi,et al. A Theory of Learning Simple Concepts Under Simple Distributions , 1989, COLT 1989.

[20] Gyora M. Benedek,et al. A parametrization scheme for classifying models of learnability , 1989, COLT '89.

[21] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[22] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[23] J. Shawe-Taylor. Building symmetries into feedforward networks , 1989 .

[24] John Shawe-Taylor,et al. The learnability of formal concepts , 1990, COLT '90.

[25] Robert E. Schapire,et al. Exact identification of circuits using fixed points of amplification functions , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26] Karsten A. Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.

[27] Robert E. Schapire,et al. On the sample complexity of weak learning , 1990, COLT '90.

[28] Michael Kearns,et al. Computational complexity of machine learning , 1990, ACM distinguished dissertations.

[29] Eric B. Baum,et al. Polynomial time algorithms for learning neural nets , 1990, Annual Conference Computational Learning Theory.

[30] Leonard Pitt,et al. Prediction-Preserving Reducibility , 1990, J. Comput. Syst. Sci..

[31] Kenji Yamanishi,et al. A learning criterion for stochastic rules , 1990, COLT '90.

[32] Robert E. Schapire,et al. The strength of weak learnability , 1990, Mach. Learn..

[33] Alon Itai,et al. Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..

[34] Philip M. Long,et al. Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[35] Sean W. Smith,et al. Improved learning of AC0 functions , 1991, COLT '91.

[36] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[37] John Shawe-Taylor,et al. Sample sizes for multiple-output threshold networks , 1991 .

[38] Kathleen Romanik,et al. Testing as a dual to learning , 1991 .

[39] M. Anthony. Uniform convergence and learnability. , 1991 .

[40] M. Kearns,et al. On the complexity of teaching , 1991, COLT '91.

[41] Balas K. Natarajan,et al. Machine Learning: A Theoretical Approach , 1992 .

[42] Andrew Tomkins,et al. A computational model of teaching , 1992, COLT '92.

[43] Peter L. Bartlett,et al. Learning with a slowly changing distribution , 1992, COLT '92.

[44] Paola Campadelli,et al. Polynomial uniform convergence and polynomial-sample learnability , 1992, COLT '92.

[45] John Shawe-Taylor,et al. On exact specification by examples , 1992, COLT '92.

[46] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.

[47] Manfred K. Warmuth,et al. Some weak learning results , 1992, COLT '92.

[48] Alon Itai,et al. Dominating distributions and learnability , 1992, COLT '92.

[49] Kathleen Romanik,et al. Approximate testing and learnability , 1992, COLT '92.

[50] Martin Anthony,et al. Computational learning theory: an introduction , 1992 .

[51] Michele Flammini,et al. Learning DNF formulae under classes of probability distributions , 1992, COLT '92.

[52] Dana Angluin,et al. Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[53] Eduardo D. Sontag,et al. Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[54] R. Schapire. Toward Eecient Agnostic Learning , 1992 .

[55] Balas K. Natarajan,et al. Probably Approximate Learning Over Classes of Distributions , 1992, SIAM J. Comput..

[56] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.

[57] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[58] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[59] John Shawe-Taylor,et al. Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[60] John Shawe-Taylor,et al. Generalising from Approximate Interpolation , 1993 .

[61] Martin Anthony,et al. Valid generalisation of functions from close approximations on a sample , 1994 .

[62] David Haussler,et al. How to use expert advice , 1993, STOC.

[63] Peter L. Bartlett,et al. Vapnik-Chervonenkis Dimension Bounds for Two- and Three-Layer Networks , 1993, Neural Computation.

[64] Martin Anthony,et al. Computational Learning Theory for Artificial Neural Networks , 1993 .

[65] Michael Kharitonov,et al. Cryptographic hardness of distribution-specific learning , 1993, STOC.

[66] Paul W. Goldberg,et al. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[67] Noam Nisan,et al. Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[68] Hans Ulrich Simon,et al. General bounds on the number of examples needed for learning probabilistic concepts , 1993, COLT '93.

[69] Gerhard J. Woeginger,et al. On the Complexity of Function Learning , 1993, COLT.

[70] Balas K. Natarajan,et al. Occam's razor for functions , 1993, COLT '93.

[71] Wolfgang Maass,et al. On the Complexity of Learning on Feedforward Neural Nets , 1993 .

[72] A. Sakurai,et al. Tighter bounds of the VC-dimension of three layer networks , 1993 .

[73] Wolfgang Maass,et al. Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[74] Sean B. Holden,et al. On the theory of generalization and self-structuring in linearly weighted connectionist networks , 1993 .

[75] Martin Anthony,et al. On the power of polynomial discriminators and radial basis function networks , 1993, COLT '93.

[76] Eduardo D. Sontag,et al. Finiteness results for sigmoidal “neural” networks , 1993, STOC.

[77] Peter L. Bartlett,et al. Lower bounds on the Vapnik-Chervonenkis dimension of multi-layer threshold networks , 1993, COLT '93.

[78] Martin Anthony,et al. Quantifying Generalization in Linearly Weighted Neural Networks , 1994, Complex Syst..

[79] Robert E. Schapire,et al. Efficient Distribution-Free Learning of Probabilistic , 1994 .

[80] Sanjeev R. Kulkarni,et al. A metric entropy bound is not sufficient for learnability , 1994, IEEE Trans. Inf. Theory.

[81] N. Fisher,et al. Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[82] Leslie G. Valiant,et al. Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[83] David H. Wolpert,et al. The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[84] Wee Sun Lee,et al. Lower bounds on the VC-dimension of smoothly parametrized function classes , 1994, COLT '94.

[85] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[86] Wolfgang Maass,et al. Neural Nets with Superlinear VC-Dimension , 1994, Neural Computation.

[87] John Shawe-Taylor,et al. A Result of Vapnik with Applications , 1993, Discrete Applied Mathematics.

[88] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[89] Mario Marchand,et al. Strong Unimodality and Exact Learning of Constant Depth µ-Perceptron Networks , 1995, NIPS.

[90] Marek Karpinski,et al. Polynomial bounds for VC dimension of sigmoidal neural networks , 1995, STOC '95.

[91] John Shawe-Taylor,et al. On Specifying Boolean Functions by Labelled Examples , 1995, Discret. Appl. Math..

[92] Wolfgang Maass,et al. Agnostic PAC Learning of Functions on Analog Neural Nets , 1993, Neural Computation.

[93] Philip M. Long,et al. Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[94] John Shawe-Taylor. Sample Sizes for Threshold Networks with Equivalences , 1995, Inf. Comput..

[95] Umesh V. Vazirani,et al. Computational Learning Theory , 1995, SIGACT News.

[96] Mario Marchand,et al. On learning ?-perceptron networks on the uniform distribution , 1996, Neural Networks.

[97] Yuval Ishai,et al. Valid Generalisation from Approximate Interpolation , 1996, Combinatorics, Probability and Computing.

[98] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[99] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[100] Peter L. Bartlett,et al. The VC Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs , 1996, Neural Computation.

[101] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .