Redundancy reduction with information-preserving nonlinear maps

The basic idea of linear principal component analysis (PCA) involves decorrelating coordinates by an orthogonal linear transformation. In this paper we generalize this idea to the nonlinear case. Simultaneously we shall drop the usual restriction to Gaussian distributions. The linearity and orthogonality condition of linear PCA is replaced by the condition of volume conservation in order to avoid spurious information generated by the nonlinear transformation. This leads us to another very general class of nonlinear transformations, called symplectic maps. Later, instead of minimizing the correlation, we minimize the redundancy measured at the output coordinates. This generalizes second-order statistics, being only valid for Gaussian output distributions, to higher-order statistics. The proposed paradigm implements Barlow's redundancy-reduction principle for unsupervised feature extraction. The resulting factorial representation of the joint probability distribution presumably facilitates density estimatio...

[1]  P J Fox,et al.  THE FOUNDATIONS OF MECHANICS. , 1918, Science.

[2]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[3]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Kang Feng,et al.  The symplectic methods for the computation of hamiltonian equations , 1987 .

[5]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[6]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[7]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[8]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[9]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[10]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[11]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[12]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[13]  Pierre Comon,et al.  Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[14]  Esfandiar Sorouchyari,et al.  Blind separation of sources, part III: Stability analysis , 1991, Signal Process..

[15]  Kai Cieliebak,et al.  Symplectic Geometry , 1992, New Spaces in Physics.

[16]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[17]  Symplectic Phase Flow Approximation for the Numerical Integration of Canonical Systems , 1992 .

[18]  Hans Josef Pesch,et al.  Symplectic phase flow approximation for the numerical integration of canonical systems , 1992 .

[19]  Gilles Burel,et al.  Blind separation of sources: A nonlinear neural algorithm , 1992, Neural Networks.

[20]  Stephen Coombes,et al.  Learning higher order correlations , 1993, Neural Networks.

[21]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[22]  A. Norman Redlich,et al.  Supervised Factorial Learning , 1993, Neural Computation.

[23]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[24]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[25]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.

[26]  Deco,et al.  Learning time series evolution by unsupervised extraction of correlations. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.