Learning Continuous Probability Distributions with Symmetric Diffusion Networks

In this article we present symmetric diffusion networks, a family of networks that instantiate the principles of continuous, stochastic, adaptive and interactive propagation of information. Using methods of Markovion diffusion theory, we formalize the activation dynamics of these networks and then show that they can be trained to reproduce entire multivariate probability distributions on their outputs using the contrastive Hebbion learning rule (CHL). We show that CHL performs gradient descent on an error function that captures differences between desired and obtained continuous multivariate probability distributions. This allows the learning algorithm to go beyond expected values of output units and to approximate complete probability distributions on continuous multivariate activation spaces. We argue that learning continuous distributions is an important task underlying a variety of real-life situations that were beyond the scope of previous connectionist networks. Deterministic networks, like back propagation, cannot learn this task because they are limited to learning average values of independent output units. Previous stochastic connectionist networks could learn probability distributions but they were limited to discrete variables. Simulations show that symmetric diffusion networks can be trained with the CHL rule to approximate discrete and continuous probability distributions of various types.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  J. Piaget The construction of reality in the child , 1954 .

[3]  John von Neumann,et al.  The Computer and the Brain , 1960 .

[4]  Walter L. Smith Probability and Statistics , 1959, Nature.

[5]  J. Piaget The origins of intelligence in children, New York (W W Norton) 1963. , 1963 .

[6]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[7]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[8]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[9]  Roger Ratcliff,et al.  A Theory of Memory Retrieval. , 1978 .

[10]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[11]  J. J. Hopfield,et al.  ‘Unlearning’ has a stabilizing effect in collective memories , 1983, Nature.

[12]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  N. Fox,et al.  Social perception in infants , 1985 .

[15]  H. M. Taylor,et al.  An introduction to stochastic modeling , 1985 .

[16]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[19]  K. Vahala Handbook of stochastic methods for physics, chemistry and the natural sciences , 1986, IEEE Journal of Quantum Electronics.

[20]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[21]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[22]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[23]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: part 1.: an account of basic findings , 1988 .

[24]  Geoffrey E. Hinton Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 1989, Neural Computation.

[25]  Carsten Peterson,et al.  Explorations of the mean field theory learning algorithm , 1989, Neural Networks.

[26]  Y. Akiyama,et al.  Combinatorial optimization with Gaussian machines , 1989, International 1989 Joint Conference on Neural Networks.

[27]  Jeffrey L. Elman,et al.  Connectionist Models: Proceedings of the Summer School Held in San Diego, California on 1990 , 1990 .

[28]  D. Gillespie Markov Processes: An Introduction for Physical Scientists , 1991 .

[29]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[30]  Pierre Baldi,et al.  Contrastive Learning and Neural Oscillations , 1991, Neural Computation.

[31]  Joshua Alspector,et al.  Experimental Evaluation of Learning in a Neural Microsystem , 1991, NIPS.

[32]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[33]  Geoffrey E. Hinton,et al.  Deterministic Boltzmann Learning in Networks with Asymmetric Connectivity , 1991 .

[34]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[35]  D. Meyer,et al.  Toward a Theory of Information Processing in Graded, Random, and Interactive Networks , 1993 .

[36]  Javier R. Movellan,et al.  Contrastive learning with graded random networks , 1994, COLT 1994.

[37]  Michael I. Jordan Motor Learning and the Degrees of Freedom Problem , 2018, Attention and Performance XIII.