Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Robust perception relies on both bottom-up and top-down signals. Bottom-up signals consist of what's directly observed through sensation. Top-down signals consist of beliefs and expectations based on past experience and short-term memory, such as how the phrase `peanut butter and~...' will be completed. The optimal combination of bottom-up and top-down information remains an open question, but the manner of combination must be dynamic and both context and task dependent. To effectively utilize the wealth of potential top-down information available, and to prevent the cacophony of intermixed signals in a bidirectional architecture, mechanisms are needed to restrict information flow. We explore deep recurrent neural net architectures in which bottom-up and top-down signals are dynamically combined using attention. Modularity of the architecture further restricts the sharing and communication of information. Together, attention and modularity direct information flow, which leads to reliable performance improvements in perceptual and language tasks, and in particular improves robustness to distractions and noisy data. We demonstrate on a variety of benchmarks in language modeling, sequential image classification, video prediction and reinforcement learning that the \emph{bidirectional} information flow can improve results over strong baselines.

[1]  Yoshua Bengio,et al.  The Consciousness Prior , 2017, ArXiv.

[2]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[3]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[5]  H. Kennedy,et al.  Visual Areas Exert Feedforward and Feedback Influences through Distinct Frequency Channels , 2014, Neuron.

[6]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[7]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[10]  G. Pourtois,et al.  What is Bottom-Up and What is Top-Down in Predictive Coding? , 2013, Front. Psychol..

[11]  S. Manita,et al.  A Top-Down Cortical Circuit for Accurate Sensory Perception , 2015, Neuron.

[12]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[13]  J. Wolfe,et al.  The order of visual processing: “Top-down,” “bottom-up,” or “middle-out” , 1979, Perception & psychophysics.

[14]  Kathleen S. Rockland,et al.  About connections , 2015, Front. Neuroanat..

[15]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[16]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[17]  Hugo Larochelle,et al.  Modulating early visual processing by language , 2017, NIPS.

[18]  Edward H. Adelson,et al.  Motion illusions as optimal percepts , 2002, Nature Neuroscience.

[19]  L. Itti,et al.  Mechanisms of top-down attention , 2011, Trends in Neurosciences.

[20]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[21]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Satoru Miyauchi,et al.  Reciprocal connectivity in visual cortex: evidence from fMRI , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[26]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[27]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[28]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[29]  Karl J. Friston,et al.  Distinct Top-down and Bottom-up Brain Connectivity During Visual Perception and Imagery , 2017, Scientific Reports.

[30]  B. Baars IN THE THEATRE OF CONSCIOUSNESS Global Workspace Theory, A Rigorous Scientific Theory of Consciousness. , 1997 .

[31]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[32]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[33]  Yoram Singer,et al.  Convolutional Bipartite Attractor Networks , 2019, ArXiv.