Convergence analysis of local feature extraction algorithms

We investigate the asymptotic behavior of a general class of on-line Principal Component Analysis (PCA) learning algorithms, focusing our attention on the analysis of two algorithms which have recently been proposed and are based on strictly local learning rules. We rigorously establish that the behavior of the algorithms is intimately related to an ordinary differential equation (ODE) which is obtained by suitably averaging over the training patterns, and study the equilibria of these ODEs and their local stability properties. Our results imply, in particular, that local PCA algorithms should always incorporate hierarchical rather than more competitive, symmetric decorrelation, for reasons of superior performance of the algorithms.

[1]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[2]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[3]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[4]  Pierre Baldi,et al.  Linear Learning: Landscapes and Algorithms , 1988, NIPS.

[5]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[6]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[7]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[8]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[9]  Peter Földiák,et al.  Adaptation and decorrelation in the cortex , 1989 .

[10]  H. White,et al.  A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models , 1988 .

[11]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[12]  Kurt Hornik,et al.  Convergence of learning algorithms with constant learning rates , 1991, IEEE Trans. Neural Networks.

[13]  Donald W. K. Andrews,et al.  An empirical process central limit theorem for dependent non-identically distributed random variables , 1989 .

[14]  D. McLeish A Maximal Inequality and Dependent Strong Laws , 1975 .

[15]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[16]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[17]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[18]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[19]  Sun-Yuan Kung,et al.  A neural network learning algorithm for adaptive principal component extraction (APEX) , 1990, International Conference on Acoustics, Speech, and Signal Processing.