Maximizing Mutual Information

Introduction Consider the problem of getting a neural network to associate an appropriate response with an image sequence. The obvious approach is to use supervised training. If the network has around 1014 parameters and bnly lives for around lo9 seconds, the supervision signal had better contain at least lo5 bits per second to make use of the capacity of the synapses. It is not immediately obvious where such a rich supervision signal could come from. A more promising approach depends on the observation that images are not random but are generated by physical processes of limited complexity and that the appropriate response to an image nearly always depends on the physical causes of the image rather than the pixel intensities. This suggests that an unsupervised learning process should be used to solve the difficult problem of extracting the underlying causes, and decisions about responses can be left to a separate learning algorithm that takes the underlying causes rather than the saw sensory data as its inputs. Unsupervised learning can usually be viewed as a method of modeling the probability density of the inputs, so the rich sensory input itself can provide the lo5 bits per second of constraint that is required to amke use of the capacity of the synapses. The papers in this collection provide a sample of research on unsuper-vised learning. Some areas and important contributions are not represented either because an appropriate paper did not appear in Neural Computation or because of the limited space that was available. One entire area of research in unsupervised learning, self-organizing map formation, will appear as a separate volume in this series. Despite these limitations, the wide range of approaches that is included here serves as a guide to the development of the field of unsupervised learning. Redundancy Reduction One of the earliest formulations of unsupervised learning in the context of vision was the concept of redundancy reduction (Attneave 1954; Barlow 1959; Barlow 1989). The goal was to find ways to compress the information contained in images, a goal that was also pursued in the commercial arena to reduce the bandwidth needed to transmit images. In the case of the human visual system, information in the array of photoreceptors in the retina, which number around 100 million, is compressed and represented by spike trains in around 1 million ganglion cells whose axons form the optic nerve. Atick and Redlich (1993) used …

[1]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[2]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[3]  Peter Dayan,et al.  Factor Analysis Using Delta-Rule Wake-Sleep Learning , 1997, Neural Computation.

[4]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[5]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources , 1999, Neural Comput..

[6]  Eric Mjolsness,et al.  Learning with Preknowledge: Clustering with Point and Graph Matching Distance Measures , 1996, Neural Computation.

[7]  E. W. Kairiss,et al.  Hebbian synapses: biophysical mechanisms and algorithms. , 1990, Annual review of neuroscience.

[8]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[9]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[10]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[11]  Kechen Zhang,et al.  Emergence of Position-Independent Detectors of Sense of Rotation and Dilation with Hebbian Learning: An Analysis , 1999, Neural Computation.

[12]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[14]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[15]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[16]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[17]  K. Miller,et al.  Ocular dominance column development: analysis and simulation. , 1989, Science.

[18]  J. J. Hopfield,et al.  ‘Unlearning’ has a stabilizing effect in collective memories , 1983, Nature.

[19]  Joseph J. Atick,et al.  Convergent Algorithm for Sensory Receptive Field Development , 1993, Neural Computation.

[20]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[21]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[22]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[23]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[24]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[25]  James V. Stone Learning Perceptually Salient Visual Parameters Using Spatiotemporal Smoothness Constraints , 1996, Neural Computation.

[26]  D. Johnston,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997 .

[27]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[28]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[31]  Terrence J. Sejnowski,et al.  Self-Organizing Map Formation: Foundations of Neural Computation , 2001 .

[32]  Geoffrey E. Hinton,et al.  Learning Mixture Models of Spatial Coherence , 1993, Neural Computation.

[33]  R Linsker,et al.  From basic network principles to neural architecture: emergence of spatial-opponent cells. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[35]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.