Unsupervised slow subspace-learning from stationary processes

We propose a method of unsupervised learning from stationary, vector-valued processes. A projection to a low-dimensional subspace is selected on the basis of an objective function which rewards data-variance and penalizes the variance of the velocity vector, thus exploiting the short-time dependencies of the process. We prove bounds on the estimation error of the objective in terms of the @b-mixing coefficients of the process. It is also shown that maximizing the objective minimizes an error bound for simple classification algorithms on a generic class of learning tasks. Experiments with image recognition demonstrate the algorithms ability to learn geometrically invariant feature maps.

[1]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[2]  B. Simon Trace ideals and their applications , 1979 .

[3]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[4]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[5]  A. Dembo,et al.  A note on uniform laws of averages for dependent processes , 1993 .

[6]  R. C. Bradley Basic Properties of Strong Mixing Conditions , 1985 .

[7]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[8]  E. Rio,et al.  Théorie asymptotique de processus aléatoires faiblement dépendants , 2000 .

[9]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[10]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[11]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[12]  N. Cristianini,et al.  Estimating the moments of a random vector with applications , 2003 .

[13]  Andreas Maurer Unsupervised Slow Subspace-Learning from Stationary Processes , 2006, ALT.

[14]  Ron Meir,et al.  Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.

[15]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[16]  Iven M. Y. Mareels,et al.  A dual purpose principal and minor component flow , 2005, Syst. Control. Lett..

[17]  H. Bauer Measure and integration theory , 2001 .

[18]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations , 2005, NIPS.

[19]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[20]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[21]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[22]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[23]  John B. Moore,et al.  Global analysis of Oja's flow for neural networks , 1994, IEEE Trans. Neural Networks.

[24]  R. C. Bradley Basic properties of strong mixing conditions. A survey and some open questions , 2005, math/0511078.

[25]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[26]  Andreas Maurer Generalization Bounds for Subspace Selection and Hyperbolic PCA , 2005, SLSFS.

[27]  Erkki Oja,et al.  Principal component analysis , 1998 .