Local Dimensionality Reduction for Non-Parametric Regression

Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too expensive if many local models need to be maintained simultaneously. Thus, local linear dimensionality reduction combined with locally-weighted regression seems to be a promising solution. In this context, we review linear dimensionality-reduction methods, compare their performance on non-parametric locally-linear regression, and discuss their ability to extend to incremental learning. The considered methods belong to the following three groups: (1) reducing dimensionality only on the input data, (2) modeling the joint input-output data distribution, and (3) optimizing the correlation between projection directions and output data. Group 1 contains principal component regression (PCR); group 2 contains principal component analysis (PCA) in joint input and output space, factor analysis, and probabilistic PCA; and group 3 contains reduced rank regression (RRR) and partial least squares (PLS) regression. Among the tested methods, only group 3 managed to achieve robust performance even for a non-optimal number of components (factors or projection directions). In contrast, group 1 and 2 failed for fewer components since these methods rely on the correct estimate of the true intrinsic dimensionality. In group 3, PLS is the only method for which a computationally-efficient incremental implementation exists. Thus, PLS appears to be ideally suited as a building block for a locally-weighted regressor in which projection directions are incrementally added on the fly.

[1]  G. Matheron Principles of geostatistics , 1963 .

[2]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[3]  J. T. Webster,et al.  Latent Root Regression Analysis , 1974 .

[4]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[5]  A. L. V. D. Wollenberg Redundancy analysis an alternative for canonical correlation analysis , 1977 .

[6]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[7]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[8]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[9]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[10]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[11]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[12]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[13]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[14]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[15]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[16]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[17]  Javier R. Movellan,et al.  Learning Continuous Probability Distributions with Symmetric Diffusion Networks , 1993, Cogn. Sci..

[18]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[19]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[20]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[21]  J. Geweke,et al.  Bayesian reduced rank regression in econometrics , 1996 .

[22]  Stefan Schaal,et al.  Local Dimensionality Reduction , 1997, NIPS.

[23]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[24]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[25]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[26]  H. Müller,et al.  Local Polynomial Modeling and Its Applications , 1998 .

[27]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[28]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[30]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[35]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[36]  Stefan Schaal,et al.  Fast and efficient incremental learning for high-dimensional movement systems , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[37]  S. Schaal,et al.  Origins and violations of the 2/3 power law in rhythmic three-dimensional arm movements , 2000, Experimental Brain Research.

[38]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[39]  Zheng Bao,et al.  Robust recursive least squares learning algorithm for principal component analysis , 2000, IEEE Trans. Neural Networks Learn. Syst..

[40]  Stefan Schaal,et al.  Are internal models of the entire body learnable , 2001 .

[41]  Dagmar Sternad,et al.  Origins and Violations of the 2/3 Power Law in Rhythmic 3D Arm Movements , 2001 .

[42]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[43]  Ralf Möller Interlocking of learning and orthonormalization in RRLSA , 2002, Neurocomputing.

[44]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[45]  Ben J. A. Kröse,et al.  Supervised Dimension Reduction of Intrinsically Low-Dimensional Data , 2002, Neural Computation.

[46]  Stefan Schaal,et al.  Statistical Learning for Humanoid Robots , 2002, Auton. Robots.

[47]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[48]  Heiko Hoffmann,et al.  Unsupervised Learning of a Kinematic Arm Model , 2003, ICANN.

[49]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[50]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[51]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[52]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[53]  Heiko Hoffmann,et al.  Unsupervised learning of visuomotor associations , 2005 .

[54]  Juha Karhunen,et al.  Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[55]  Bovas Abraham,et al.  Dimensionality reduction approach to multivariate prediction , 2005, Comput. Stat. Data Anal..

[56]  Wolfram Schenck,et al.  Learning visuomotor transformations for gaze-control and grasping , 2005, Biological Cybernetics.

[57]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[58]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[59]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.