Parametrized Hierarchical Procedures for Neural Programming

Neural programs are highly accurate and structured policies that perform algorithmic tasks by controlling the behavior of a computation mechanism. Despite the potential to increase the interpretability and the compositionality of the behavior of artificial agents, it remains difficult to learn from demonstrations neural networks that represent computer programs. The main challenges that set algorithmic domains apart from other imitation learning domains are the need for high accuracy, the involvement of specific structures of data, and the extremely limited observability. To address these challenges, we propose to model programs as Parametrized Hierarchical Procedures (PHPs). A PHP is a sequence of conditional operations, using a program counter along with the observation to select between taking an elementary action, invoking another PHP as a sub-procedure, and returning to the caller. We develop an algorithm for training PHPs from a set of supervisor demonstrations, only some of which are annotated with the internal call structure, and apply it to efficient level-wise training of multi-level PHPs. We show in two benchmarks, NanoCraft and long-hand addition, that PHPs can learn neural programs more accurately from smaller amounts of both annotated and unannotated demonstrations.

[1]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[2]  Alan Fern,et al.  Active Imitation Learning of Hierarchical Policies , 2015, IJCAI.

[3]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[4]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[5]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..

[6]  Blai Bonet,et al.  Deterministic POMDPs Revisited , 2009, UAI.

[7]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[8]  Marc Brockschmidt,et al.  Neural Program Lattices , 2016, ICLR.

[9]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[10]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[11]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[12]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[13]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[14]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[15]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[16]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[17]  Vicenç Gómez,et al.  Hierarchical Linearly-Solvable Markov Decision Problems , 2016, ICAPS.

[18]  Nahum Shimkin,et al.  Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.

[19]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[20]  Balaraman Ravindran,et al.  Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering , 2016, 1605.05359.

[21]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[22]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[23]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[24]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[25]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[26]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[27]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[28]  AUTOMATED DISCOVERY OF OPTIONS IN REINFORCEMENT LEARNING , 2003 .

[29]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[30]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[31]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[32]  Gregory D. Hager,et al.  Transition State Clustering: Unsupervised Surgical Trajectory Segmentation for Robot Learning , 2017, ISRR.

[33]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[34]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[35]  Jordi Grau-Moya,et al.  Bounded Rationality, Abstraction, and Hierarchical Decision-Making: An Information-Theoretic Optimality Principle , 2015, Front. Robot. AI.

[36]  Roy Fox,et al.  Principled Option Learning in Markov Decision Processes , 2016, ArXiv.

[37]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[38]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[39]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..