Early Inference in Energy-Based Models Approximates Back-Propagation

We show that Langevin MCMC inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similarly to back-propagation. The error that is back-propagated is with respect to visible units that have received an outside driving force pushing them away from the stationary point. Back-propagated error gradients correspond to temporal derivatives of the activation of hidden units. This observation could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as back-propagation does. In this theory, the continuous-valued latent variables correspond to averaged voltage potential (across time, spikes, and possibly neurons in the same minicolumn), and neural computation corresponds to approximate inference and error back-propagation at the same time.

[1]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[2]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  W. Gerstner,et al.  Spike-Timing-Dependent Plasticity: A Comprehensive Overview , 2012, Front. Syn. Neurosci..

[5]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6]  Wulfram Gerstner,et al.  Stochastic variational learning in recurrent spiking networks , 2014, Front. Comput. Neurosci..

[7]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[8]  Yoshua Bengio,et al.  STDP as presynaptic activity times rate of change of postsynaptic activity , 2015, 1509.05936.

[9]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[10]  Sanjeev Arora,et al.  Why are deep nets reversible: A simple theory, with implications for training , 2015, ArXiv.

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[13]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[14]  Yoshua Bengio,et al.  An objective function for STDP , 2015, ArXiv.

[15]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[16]  József Fiser,et al.  Spontaneous Cortical Activity Reveals Hallmarks of an Optimal Internal Model of the Environment , 2011, Science.

[17]  Ila R Fiete,et al.  Gradient learning in spiking neural networks by dynamic perturbation of conductances. , 2006, Physical review letters.

[18]  J. Hopfield Neurons withgraded response havecollective computational properties likethoseoftwo-state neurons , 1984 .

[19]  Xiaohui Xie,et al.  Spike-based Learning Rules and Stabilization of Persistent Neural Activity , 1999, NIPS.