An Information-Maximization Approach to Blind Separation and Blind Deconvolution

We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the transfer function are able to pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. We apply the network to the source separation (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. We also show that a variant on the network architecture is able to perform blind deconvolution (cancellation of unknown echoes and reverberation in a speech signal). Finally, we derive dependencies of information transfer on time delays. We suggest that information maximization provides a unifying framework for problems in "blind" signal processing.

[1]  S. Laughlin A Simple Coding Procedure Enhances a Neuron's Information Capacity , 1981, Zeitschrift fur Naturforschung. Section C, Biosciences.

[2]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[3]  Christian Jutten,et al.  Space or time adaptive signal processing by neural network models , 1987 .

[4]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[5]  Eric A. Vittoz,et al.  CMOS Integration of Herault-Jutten Cells for Separation of Sources , 1989, Analog VLSI Implementation of Neural Systems.

[6]  Peter Földiák,et al.  Adaptation and decorrelation in the cortex , 1989 .

[7]  W. Bialek,et al.  Optimal Sampling of Natural Images: A Design Principle for the Visual System , 1990, NIPS 1990.

[8]  Simon Haykin,et al.  Adaptive filter theory (2nd ed.) , 1991 .

[9]  Terrence J. Sejnowski,et al.  Competitive Anti-Hebbian Learning of Invariants , 1991, NIPS.

[10]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[11]  J J Hopfield,et al.  Olfactory computation and object perception. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John C. Platt,et al.  Networks for the Separation of Sources that Are Superimposed and Delayed , 1991, NIPS.

[13]  Pierre Comon,et al.  Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Esfandiar Sorouchyari,et al.  Blind separation of sources, part III: Stability analysis , 1991, Signal Process..

[16]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[17]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[18]  Schuster Learning by maximizing the information transfer through nonlinear noisy neurons and "noise breakdown" , 1992, Physical review. A, Atomic, molecular, and optical physics.

[19]  Schuster Hg Learning by maximizing the information transfer through nonlinear noisy neurons and "noise breakdown , 1992 .

[20]  Andreas G. Andreou,et al.  Current-mode subthreshold MOS implementation of the Herault-Jutten autoadaptive network , 1992 .

[21]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[22]  Simon Haykin,et al.  Blind equalization formulated as a self-organized learning process , 1992, [1992] Conference Record of the Twenty-Sixth Asilomar Conference on Signals, Systems & Computers.

[23]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[24]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[25]  Gilles Burel,et al.  Blind separation of sources: A nonlinear neural algorithm , 1992, Neural Networks.

[26]  Joseph J. Atick,et al.  Convergent Algorithm for Sensory Receptive Field Development , 1993, Neural Computation.

[27]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[28]  J. Nadal Non linear neurons in the low noise limit : a factorial code maximizes information transferJean , 1994 .

[29]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[30]  J. Nadal,et al.  Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer Network 5 , 1994 .

[31]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[32]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[33]  Terrence J. Sejnowski,et al.  A Non-linear Information Maximisation Algorithm that Performs Blind Separation , 1994, NIPS.

[34]  Ehud Weinstein,et al.  Criteria for multichannel signal separation , 1994, IEEE Trans. Signal Process..

[35]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[36]  Andreas G. Andreou,et al.  Analog CMOS integration and experimentation with an autoadaptive independent component analyzer , 1995 .

[37]  Terrence J. Sejnowski,et al.  Adaptive separation of mixed broadband sound sources with delays by a beamforming Herault-Jutten network , 1995 .

[38]  Y. Baram,et al.  Multi-Dimensional Density Shaping by Sigmoidal Networks , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[39]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.

[40]  L. Parra,et al.  Redundancy reduction with information-preserving nonlinear maps , 1995 .

[41]  Steve Rogers,et al.  Adaptive Filter Theory , 1996 .

[42]  Lucas C. Parra,et al.  Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps , 1996, Neural Computation.