Contextually guided unsupervised learning using local multivariate binary processors

We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work ([Kay and Phillips, 1994][Kay and Phillips, 1996]; [Phillips et al., 1995]) showed how the goals of feature discovery and associative learning could be fused within a single objective and made precise using information theory in such a way that local binary processors could extract a single feature that is coherent across streams. In this paper, we consider multi-unit local processors with multivariate binary outputs that enable a greater number of coherent features to be extracted. Using the Ising model, we define a class of information-theoretic objective functions and also local approximations and derive the learning rules in both cases. These rules have similarities to, and differences from, the celebrated BCM rule. Local and global versions of infomax appear as by-products of the general approach, as well as multivariate versions of coherent infomax. Focussing on the more biologically plausible local rules, we describe some computational experiments designed to investigate specific properties of the processors and the general approach. The main conclusions are: (1) the local methodology introduced in the paper has the required functionality. (2) Different units within the multi-unit processors learned to respond to different aspects of their receptive fields. (3) The units within each processor generally produced a distributed code in which the outputs were correlated and which was robust to damage; in the special case where the number of units available was only just sufficient to transmit the relevant information, a form of competitive learning was produced. (4) The contextual connections enabled the information correlated across streams to be extracted and, by improving feature detection with weak or noisy inputs, they played a useful role in short-term processing and in improving generalization. (5) The methodology allows the statistical associations between distributed self-organizing population codes to be learned.

[1]  U. Polat,et al.  The architecture of perceptual spatial interactions , 1994, Vision Research.

[2]  I. Jolliffe,et al.  Nonlinear Multivariate Analysis , 1992 .

[3]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[4]  James V. Stone Learning Perceptually Salient Visual Parameters Using Spatiotemporal Smoothness Constraints , 1996, Neural Computation.

[5]  W. Singer,et al.  Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity. , 1992, Science.

[6]  W. Singer Synchronization of cortical activity and its putative role in information processing and learning. , 1993, Annual review of physiology.

[7]  Joseph J. Atick,et al.  Convergent Algorithm for Sensory Receptive Field Development , 1993, Neural Computation.

[8]  C. Gilbert,et al.  Synaptic physiology of horizontal connections in the cat's visual cortex , 1991, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[9]  C. Gilbert,et al.  Long‐term changes in synaptic strength along specific intrinsic pathways in the cat visual cortex. , 1993, The Journal of physiology.

[10]  D. Massaro,et al.  Integration versus interactive activation: The joint influence of stimulus and context in perception , 1991, Cognitive Psychology.

[11]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[12]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Richard W. Hamming,et al.  Coding and Information Theory , 2018, Feynman Lectures on Computation.

[14]  Ralf Der,et al.  Local online learning of coherent information , 1998, Neural Networks.

[15]  C. Gilbert,et al.  Improvement in visual sensitivity by changes in local context: Parallel studies in human observers and in V1 of alert monkeys , 1995, Neuron.

[16]  Jim Kay,et al.  Activation Functions, Computational Goals, and Learning Rules for Local Processors with Contextual Guidance , 1997, Neural Computation.

[17]  NetworksJohn G. Taylora,et al.  Information Theory and Neural , 1993 .

[18]  Jim Kay,et al.  Feature discovery under contextual supervision using mutual information , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[19]  U. Polat,et al.  Lateral interactions between spatial channels: Suppression and facilitation revealed by lateral masking experiments , 1993, Vision Research.

[20]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[21]  Helen Suzanna Becker,et al.  An information-theoretic unsupervised learning algorithm for neural networks , 1993 .

[22]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Nathan Intrator,et al.  BCM theory of visual cortical plasticity , 1998 .

[24]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[25]  W. Singer,et al.  In search of common foundations for cortical computation , 1997, Behavioral and Brain Sciences.

[26]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[27]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[28]  Jim Kay,et al.  The discovery of structure by multi-stream networks of local processors with contextual guidance , 1995 .

[29]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[30]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[31]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[32]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[33]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[34]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[35]  N. Daw,et al.  The effect of varying stimulus intensity on NMDA-receptor activity in cat visual cortex. , 1990, Journal of neurophysiology.

[36]  Mark D. Plumbley,et al.  Information Theory and Neural Networks , 1993 .

[37]  C. Gilbert Horizontal integration and cortical dynamics , 1992, Neuron.

[38]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[39]  M. Bear,et al.  Experience-dependent modification of synaptic plasticity in visual cortex , 1996, Nature.

[40]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[41]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .