A Tree-Structured Algorithm for Reducing Computation in Networks with Separable Basis Functions

I describe a new algorithm for approximating continuous functions in high-dimensional input spaces. The algorithm builds a tree-structured network of variable size, which is determined both by the distribution of the input data and by the function to be approximated. Unlike other tree-structured algorithms, learning occurs through completely local mechanisms and the weights and structure are modified incrementally as data arrives. Efficient computation in the tree structure takes advantage of the potential for low-order dependencies between the output and the individual dimensions of the input. This algorithm is related to the ideas behind k-d trees (Bentley 1975), CART (Breiman et al. 1984), and MARS (Friedman 1988). I present an example that predicts future values of the Mackey-Glass differential delay equation.

[1]  Dennis Gabor,et al.  A universal nonlinear filter, predictor and simulator which optimizes itself by a learning process , 1961 .

[2]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[3]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[4]  James N. Morgan,et al.  Searching for structure (alias-AID-III) : an approach to analysis of substantial bodies of micro-data and documentation for a computer program (successor to the Automatic Interaction Detector Program) , 1971 .

[5]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[6]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[7]  Saburo Ikeda,et al.  Sequential GMDH Algorithm and Its Application to River Flow Prediction , 1976 .

[8]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[14]  Guo-Zheng Sun,et al.  A Novel Net that Learns Sequential Decision Process , 1987, NIPS.

[15]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[16]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[17]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[18]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[19]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[20]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[21]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[22]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[23]  M. F. Tenorio,et al.  Self-Organizing Neural Network for Optimum Supervised Learning , 1989 .

[24]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[25]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[26]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[27]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[28]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[29]  J. Friedman Multivariate adaptive regression splines , 1990 .

[30]  James D. Keeler,et al.  Predicting the Future: Advantages of Semilocal Units , 1991, Neural Computation.

[31]  Terence D. Sanger,et al.  A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[32]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[33]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[34]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[35]  Gérard Dreyfus,et al.  Handwritten digit recognition by neural networks with single-layer training , 1992, IEEE Trans. Neural Networks.

[36]  Lyle H. Ungar,et al.  A NEURAL NETWORK ARCHITECTURE THAT COMPUTES ITS OWN RELIABILITY , 1992 .