Multivariate Regression and Machine Learning with Sums of Separable Functions

We present an algorithm for learning (or estimating) a function of many variables from scattered data. The function is approximated by a sum of separable functions, following the paradigm of separated representations. The central fitting algorithm is linear in both the number of data points and the number of variables and, thus, is suitable for large data sets in high dimensions. We present numerical evidence for the utility of these representations. In particular, we show that our method outperforms other methods on several benchmark data sets.

[1]  Arta A. Jamshidi,et al.  Towards a Black Box Algorithm for Nonlinear Function Approximation over High-Dimensional Domains , 2007, SIAM J. Sci. Comput..

[2]  Ronald A. DeVore,et al.  Nonlinear approximation and its applications , 2009 .

[3]  B. Silverman,et al.  Canonical correlation analysis when the data are curves. , 1993 .

[4]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[5]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[6]  Martin J. Mohlenkamp,et al.  Algorithms for Numerical Analysis in High Dimensions , 2005, SIAM J. Sci. Comput..

[7]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[8]  Jochen Garcke,et al.  Regression with the optimised combination technique , 2006, ICML.

[9]  P. J. Huber The 1972 Wald Lecture Robust Statistics: A Review , 1972 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[12]  Hans-Joachim Bungartz,et al.  Acta Numerica 2004: Sparse grids , 2004 .

[13]  Gene H. Golub,et al.  Matrix computations , 1983 .

[14]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[15]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[16]  Michael Griebel,et al.  Classification with sparse grids using simplicial basis functions , 2002, Intell. Data Anal..

[17]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[18]  Rasmus Bro,et al.  Multi-way Analysis with Applications in the Chemical Sciences , 2004 .

[19]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[20]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[21]  L. Lathauwer,et al.  On the Best Rank-1 and Rank-( , 2004 .

[22]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[23]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[24]  Vin de Silva,et al.  Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[25]  Steffen Börm,et al.  Approximating Gaussian Processes with H2-Matrices , 2007, ECML.

[26]  Martin J. Mohlenkamp,et al.  Trigonometric identities and sums of separable functions , 2005 .

[27]  Martin J. Mohlenkamp,et al.  Numerical operator calculus in higher dimensions , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Michael Griebel,et al.  Data Mining with Sparse Grids , 2001, Computing.