Differentiable Programs with Neural Libraries

We develop a framework for combining differentiable programming languages with neural networks. Using this framework we create end-to-end trainable systems that learn to write interpretable algorithms with perceptual components. We explore the benefits of inductive biases for strong generalization and modularity that come from the program-like structure of our models. In particular, modularity allows us to learn a library of (neural) functions which grows and improves as more tasks are solved. Empirically, we show that this leads to lifelong learning systems that transfer knowledge to new tasks more effectively than baselines.

[1]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[2]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[3]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[4]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[5]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[6]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[7]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[8]  Thomas R. Shultz,et al.  Knowledge-based cascade-correlation: Using knowledge to speed learning , 2001, Connect. Sci..

[9]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[10]  Daniel L. Silver,et al.  Machine Life-Long Learning with csMTL Networks , 2006, AAAI.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[14]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[15]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[16]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[17]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[18]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[19]  Bing Liu,et al.  Lifelong Learning for Sentiment Classification , 2015, ACL.

[20]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[21]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[22]  Wojciech Zaremba,et al.  Extensions and Limitations of the Neural GPU , 2016, ArXiv.

[23]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[24]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[25]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[26]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[27]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[28]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[29]  Marcin Andrychowicz,et al.  Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[32]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[33]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[34]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[35]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[36]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[37]  Marc Brockschmidt,et al.  Neural Functional Programming , 2016, ICLR.