Extensions of a Theory of Networks for Approximation and Learning

Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function. We develop a theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that we call Generalized Radial Basis Functions (GRBF). GRBF networks are not only equivalent to generalized splines, but are also closely related to several pattern recognition methods and neural network algorithms. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.

[1]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[2]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[3]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[4]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[5]  R. Penrose A Generalized inverse for matrices , 1955 .

[6]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[7]  G. Lorentz METRIC ENTROPY, WIDTHS, AND SUPERPOSITIONS OF FUNCTIONS , 1962 .

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[10]  S. G. Mikhlin,et al.  The problem of the minimum of a quadratic functional , 1965 .

[11]  藤田 宏 S.G. Mikhlin: The Problem of the Minimum of a Quadratic Functional, Holden-Day INC., San Francisco, 1965, 155+ix頁, 18×25cm, 3,580円. , 1965 .

[12]  L. Goddard Approximation of Functions , 1965, Nature.

[13]  D. Sprecher On the structure of continuous functions of several variables , 1965 .

[14]  G. Lorentz Metric entropy and approximation , 1966 .

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  D. Marr A theory of cerebellar cortex , 1969, The Journal of physiology.

[17]  R. L. Hardy Multiquadric equations of topography and other irregular surfaces , 1971 .

[18]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[19]  R. N. Desmarais,et al.  Interpolation using surface splines. , 1972 .

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Adi Ben-Israel,et al.  Generalized inverses: theory and applications , 1974 .

[22]  H. L. Resnikoff On the psychophysical function , 1975 .

[23]  J. Kahane Sur le théorème de superposition de Kolmogorov , 1975 .

[24]  J. Stewart Positive definite functions and generalizations, an historical survey , 1976 .

[25]  D Marr,et al.  Cooperative computation of stereo disparity. , 1976, Science.

[26]  Shang‐keng Ma Modern Theory of Critical Phenomena , 1976 .

[27]  Jean Duchon,et al.  Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces , 1976 .

[28]  G. Wahba Practical Approximate Solutions to Linear Operator Equations When the Data are Noisy , 1977 .

[29]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[30]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[31]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[32]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[33]  Harold Abelson,et al.  Corrigendum: Towards a Theory of Local and Global in Computation , 1978, Theoretical Computer Science.

[34]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[35]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[36]  J. Meinguet Multivariate interpolation at arbitrary points made simple , 1979 .

[37]  I. Schagen Interpolation in Two Dimensions—A New Technique , 1979 .

[38]  J. Meinguet An Intrinsic Approach to Multivariate Spline Interpolation at Arbitrary Points , 1979 .

[39]  W. Eric L. Grimson,et al.  From images to surfaces , 1981 .

[40]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[41]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[42]  Allen Gersho,et al.  On the structure of vector quantizers , 1982, IEEE Trans. Inf. Theory.

[43]  R. Franke Scattered data interpolation: tests of some methods , 1982 .

[44]  J. Yorke,et al.  Dimension of chaotic attractors , 1982 .

[45]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[46]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  D. Cox MULTIVARIATE SMOOTHING SPLINE FUNCTIONS , 1984 .

[48]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[49]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[50]  E. Dubois,et al.  Digital picture processing , 1985, Proceedings of the IEEE.

[51]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[52]  S. Abhyankar Hilbert's Thirteenth Problem , 1985 .

[53]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[54]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[55]  D. Braess Nonlinear Approximation Theory , 1986 .

[56]  M. Bertero Regularization methods for linear inverse problems , 1986 .

[57]  A. Verri,et al.  Regularization Theory and Shape Constraints , 1986 .

[58]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[59]  Bartlett W. Mel MURPHY: A Robot that Learns by Doing , 1987, NIPS.

[60]  R. Hecht-Nielsen Kolmogorov''s Mapping Neural Network Existence Theorem , 1987 .

[61]  James Demmel,et al.  The geometry of III-conditioning , 1987, J. Complex..

[62]  Helge Ritter,et al.  Topology conserving mappings for learning motor tasks , 1987 .

[63]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[64]  Tomaso Poggio,et al.  Probabilistic Solution of Ill-Posed Problems in Computational Vision , 1987 .

[65]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[66]  R. DeVore,et al.  Free multivariate splines , 1987 .

[67]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[68]  A. J. Mistlin,et al.  Visual neurones responsive to faces , 1987, Trends in Neurosciences.

[69]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[70]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[71]  I. R. H. Jackson Convergence properties of radial basis functions , 1988 .

[72]  Alan L. Yuille,et al.  The Motion Coherence Theory , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[73]  John G. Harris,et al.  An Analog VLSI Chip for Thin-Plate Surface Interpolation , 1988, NIPS.

[74]  T. Poggio,et al.  Synthesizing a color algorithm from examples. , 1988, Science.

[75]  J. Keeler Comparison Between Kanerva's SDM and Hopfield-Type Neural Networks , 1988, Cogn. Sci..

[76]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[77]  Helge Ritter,et al.  Extending Kohonens Self-Organizing Mapping Algorithm to Learn Ballistic Movements , 1988 .

[78]  Tomaso A. Poggio,et al.  Representation properties of multilayer feedforward networks , 1988, Neural Networks.

[79]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[80]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[81]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[82]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[83]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[84]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[85]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[86]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[87]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[88]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[89]  John Y. Aloimonos,et al.  Unification and integration of visual modules: an extension of the Marr Paradigm , 1989 .

[90]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[91]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[92]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[93]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[95]  Barbara Moore,et al.  Theory of networks for learning , 1990, Defense, Security, and Sensing.

[96]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[97]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[98]  F. Girosi,et al.  Extensions of a Theory of Networks and Learning: Outliers and Negative Examples , 1990 .

[99]  Federico Girosi,et al.  Parallel and deterministic algorithms from MRFs: surface reconstruction and integration , 1990, ECCV.

[100]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[101]  Tomaso Poggio,et al.  HyperBF: a powerful approximation technique for learning , 1991 .

[102]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Shimon Edelman,et al.  Bringing the Grandmother back into the Picture: A Memory-Based View of Object Recognition , 1990, Int. J. Pattern Recognit. Artif. Intell..