Unsupervised Slow Subspace-Learning from Stationary Processes

We propose a method of unsupervised learning from stationary, vector-valued processes. A low-dimensional subspace is selected on the basis of a criterion which rewards data-variance (like PSA) and penalizes the variance of the velocity vector, thus exploiting the short-time dependencies of the process. We prove error bounds in terms of the β-mixing coefficients and consistency for absolutely regular processes. Experiments with image recognition demonstrate the algorithms ability to learn geometrically invariant feature maps.

[1]  Ron Meir,et al.  Nonparametric Time Series Prediction Through Adaptive Model Selection , 2000, Machine Learning.

[2]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[3]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[4]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[5]  John B. Moore,et al.  Global analysis of Oja's flow for neural networks , 1994, IEEE Trans. Neural Networks.

[6]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[7]  A. Dembo,et al.  A note on uniform laws of averages for dependent processes , 1993 .

[8]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[9]  H. Bauer Measure and integration theory , 2001 .

[10]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations , 2005, NIPS.

[11]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Recursive Boosting , 2006, 2006 IEEE International Symposium on Information Theory.

[12]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[13]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[14]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[15]  Andreas Maurer Generalization Bounds for Subspace Selection and Hyperbolic PCA , 2005, SLSFS.

[16]  Iven M. Y. Mareels,et al.  A dual purpose principal and minor component flow , 2005, Syst. Control. Lett..

[17]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[18]  E. Rio,et al.  Théorie asymptotique de processus aléatoires faiblement dépendants , 2000 .

[19]  Erkki Oja,et al.  Principal component analysis , 1998 .

[20]  B. Simon Trace ideals and their applications , 1979 .

[21]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..