Learning Compositional Neural Programs with Recursive Tree Search and Planning

We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. Using this specification, AlphaNPI is able to train NPI models effectively with RL for the first time, completely eliminating the need for strong supervision in the form of execution traces. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disk

[1]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[2]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[3]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[4]  Ashley D. Edwards,et al.  Forward-Backward Reinforcement Learning , 2018, ArXiv.

[5]  Armando Solar-Lezama,et al.  Learning to Infer Program Sketches , 2019, ICML.

[6]  Matthew J. Hausknecht,et al.  Neural Program Meta-Induction , 2017, NIPS.

[7]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[8]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[9]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[10]  Dawn Xiaodong Song,et al.  Towards Synthesizing Complex Programs From Input-Output Examples , 2017, ICLR.

[11]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[12]  Marc Toussaint,et al.  Hierarchical Monte-Carlo Planning , 2015, AAAI.

[13]  Yunguan Fu,et al.  Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization , 2018, ArXiv.

[14]  Dawn Xiaodong Song,et al.  Improving Neural Program Synthesis with Inferred Execution Traces , 2018, NeurIPS.

[15]  Marc Brockschmidt,et al.  Neural Program Lattices , 2016, ICLR.

[16]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[17]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[18]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[19]  Hyeonwoo Noh,et al.  Neural Program Synthesis from Diverse Demonstration Videos , 2018, ICML.

[20]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Da Xiao,et al.  Improving the Universality and Learnability of Neural Programmer-Interpreters with Combinator Abstraction , 2018, ICLR.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[24]  Stuart J. Russell,et al.  Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.

[25]  Dawn Xiaodong Song,et al.  Parametrized Hierarchical Procedures for Neural Programming , 2018, ICLR.

[26]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[27]  Heiga Zen,et al.  Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.

[28]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[30]  Pushmeet Kohli,et al.  Adaptive Neural Compilation , 2016, NIPS.

[31]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[32]  Masashi Sugiyama,et al.  Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization , 2019, ICLR.

[33]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[34]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[35]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[36]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[37]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[38]  Richard Evans,et al.  Learning Explanatory Rules from Noisy Data , 2017, J. Artif. Intell. Res..