Inductive Biases for Deep Learning of Higher-Level Cognition

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.

[1]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[2]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[3]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[4]  Yoshua Bengio,et al.  Untangling tradeoffs between recurrence and self-attention in neural networks , 2020, ArXiv.

[5]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[6]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[8]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  P Alvarez,et al.  Memory consolidation and the medial temporal lobe: a simple network model. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[13]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[14]  J. J. Gibson The theory of affordances , 1977 .

[15]  Jonathan D. Cohen,et al.  Anterior cingulate and prefrontal cortex: who's in control? , 2000, Nature Neuroscience.

[16]  Yoshua Bengio,et al.  An Analysis of the Adaptation Speed of Causal Models , 2020, ArXiv.

[17]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[18]  Jeffrey M. Zacks,et al.  Event perception: a mind-brain perspective. , 2007, Psychological bulletin.

[19]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[20]  Samy Bengio,et al.  Taking on the curse of dimensionality in joint distributions using neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  Yoshua Bengio,et al.  Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.

[22]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[25]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[26]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[27]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[28]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[29]  三嶋 博之 The theory of affordances , 2008 .

[30]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[31]  Yoshua Bengio,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[32]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[35]  Yoshua Bengio,et al.  Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[36]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[37]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[38]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[39]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[40]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[41]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[42]  J. Changeux,et al.  Experimental and Theoretical Approaches to Conscious Processing , 2011, Neuron.

[43]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[44]  Joel Z. Leibo,et al.  Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[45]  Joelle Pineau,et al.  Independently Controllable Features , 2017 .

[46]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[47]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[48]  B. Baars A cognitive theory of consciousness , 1988 .

[49]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[50]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[51]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[52]  S. Dehaene,et al.  What is consciousness, and could machines have it? , 2017, Science.

[53]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[56]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[57]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[58]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[59]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[60]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[61]  Frederick Eberhardt,et al.  On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables , 2005, UAI.

[62]  Christopher M. Bishop,et al.  Training with Noise is Equivalent to Tikhonov Regularization , 1995, Neural Computation.

[63]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[64]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[65]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[66]  Philippe Beaudoin,et al.  Independently Controllable Factors , 2017, ArXiv.

[67]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[68]  Razvan Pascanu,et al.  On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.

[69]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[70]  Doina Precup,et al.  Hindsight Credit Assignment , 2019, NeurIPS.

[71]  Razvan Pascanu,et al.  On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.

[72]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[73]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[74]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[75]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[76]  Terry Winograd,et al.  Understanding natural language , 1974 .

[77]  B. Baars IN THE THEATRE OF CONSCIOUSNESS Global Workspace Theory, A Rigorous Scientific Theory of Consciousness. , 1997 .

[78]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[79]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[80]  Murray Shanahan,et al.  Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules , 2020, ICML.

[81]  Yoshua Bengio,et al.  CLOSURE: Assessing Systematic Generalization of CLEVR Models , 2019, ViGIL@NeurIPS.

[82]  M. Botvinick,et al.  Conflict monitoring and cognitive control. , 2001, Psychological review.

[83]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[84]  Léon Bottou,et al.  Learning Representations Using Causal Invariance , 2020, European Grid Conference.

[85]  Jeff Clune,et al.  AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence , 2019, ArXiv.

[86]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[87]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[88]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[89]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[90]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[91]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[92]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[93]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[94]  Bernhard Schölkopf,et al.  Artificial intelligence: Learning to see and act , 2015, Nature.

[95]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[96]  John A. Barnden,et al.  Semantic Networks , 1998, Encyclopedia of Social Network Analysis and Mining.

[97]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[98]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[99]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[100]  Yoshua Bengio,et al.  Evolving Culture Versus Local Minima , 2014, Growing Adaptive Machines.

[101]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[102]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[103]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[104]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[105]  Jeffrey M. Zacks,et al.  Event boundaries in memory and cognition , 2017, Current Opinion in Behavioral Sciences.

[106]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.