Efficient agnostic learning of neural networks with bounded fan-in

We show that the class of two-layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the probably approximately correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts, and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation steps of the learning algorithm is bounded by a polynomial in 1//spl epsiv/, 1//spl delta/, n and B where /spl epsiv/ is the desired accuracy, /spl delta/ is the probability that the algorithm fails, n is the input dimension, and B is the bound on both the absolute value of the target (which may be a random variable) and the sum of the absolute values of the output weights. In obtaining the result, we also extended some results on iterative approximation of functions in the closure of the convex hull of a function class and on the sample complexity of agnostic learning with the quadratic loss function.

[1]  C. Craig On the Tchebychef Inequality of Bernstein , 1933 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[4]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[5]  D. Pollard Convergence of stochastic processes , 1984 .

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[10]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[11]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[14]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[15]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[16]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[17]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[18]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[19]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[20]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[21]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.

[22]  Mark Jerrum Simple Translation-Invariant Concepts Are Hard to Learn , 1994, Inf. Comput..

[23]  Daniel F. McCaffrey,et al.  Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[24]  Pascal Koiran,et al.  Efficient learning of continuous neural networks , 1994, COLT '94.

[25]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26]  Peter L. Bartlett,et al.  On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[27]  Wolfgang Maass,et al.  Agnostic PAC Learning of Functions on Analog Neural Nets , 1993, Neural Computation.

[28]  D. Pollard Uniform ratio limit theorems for empirical processes , 1995 .

[29]  Peter L. Bartlett,et al.  The importance of convexity in learning with squared loss , 1998, COLT '96.