Agnostic PAC Learning of Functions on Analog Neural Nets

We consider learning on multilayer neural nets with piecewise polynomial activation functions and a fixed number k of numerical inputs. We exhibit arbitrarily large network architectures for which efficient and provably successful learning algorithms exist in the rather realistic refinement of Valiant's model for probably approximately correct learning ("PAC learning") where no a priori assumptions are required about the "target function" (agnostic learning), arbitrary noise is permitted in the training sample, and the target outputs as well as the network outputs may be arbitrary reals. The number of computation steps of the learning algorithm LEARN that we construct is bounded by a polynomial in the bit-length n of the fixed number of input variables, in the bound s for the allowed bit-length of weights, in 1/, where is some arbitrary given bound for the true error of the neural net after training, and in 1/ where is some arbitrary given bound for the probability that the learning algorithm fails for a randomly drawn training sample. However, the computation time of LEARN is exponential in the number of weights of the considered network architecture, and therefore only of interest for neural nets of small size. This article provides details to the previously published extended abstract (Maass 1994).

[1]  J. Milnor On the Betti numbers of real varieties , 1964 .

[2]  John E. Savage,et al.  The Complexity of Computing , 1976 .

[3]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[5]  D. E. Rumelhart,et al.  chapter Parallel Distributed Processing, Exploration in the Microstructure of Cognition , 1986 .

[6]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[7]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[8]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[9]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[10]  J. Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I , 1989 .

[11]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[12]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[13]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[14]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[15]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[16]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[17]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I: Introduction. Preliminaries. The Geometry of Semi-Algebraic Sets. The Decision Problem for the Existential Theory of the Reals , 1992, J. Symb. Comput..

[18]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[19]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[20]  Wolfgang Maass,et al.  Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[21]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[22]  Wolfgang Maass,et al.  Computing on Analog Neural Nets with Arbitrary Real Weights , 1994 .

[23]  Wolfgang Maass,et al.  Perspectives of Current Research about the Complexity of Learning on Neural Nets , 1994 .

[24]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.