论文信息 - Memory Augmented Policy Optimization for Program Synthesis with Generalization - 字舞流文

Memory Augmented Policy Optimization for Program Synthesis with Generalization

This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside the memory. We propose 3 techniques to make an efficient training algorithm for MAPO: (1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover high reward trajectories. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with a sparse reward. We evaluate MAPO on weakly supervised program synthesis from natural language with an emphasis on generalization. On the WikiTableQuestions benchmark we improve the state-of-the-art by 2.5%, achieving an accuracy of 46.2%, and on the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our code is open sourced at this https URL

Chen Liang | Quoc V. Le | Ni Lao | Jonathan Berant | Mohammad Norouzi | Mohammad Norouzi | Chen Liang | Jonathan Berant | N. Lao

[1] Chenglong Wang,et al. Pointing Out SQL Queries From Text , 2018 .

[2] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[6] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[7] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[8] Ming-Wei Chang,et al. The Value of Semantic Parse Labeling for Knowledge Base Question Answering , 2016, ACL.

[9] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[10] Dan Klein,et al. Learning Dependency-Based Compositional Semantics , 2011, CL.

[11] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[12] Quoc V. Le,et al. Neural Program Synthesis with Priority Queue Training , 2018, ArXiv.

[13] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[14] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[15] Martín Abadi,et al. Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[16] Ming Zhou,et al. Semantic Parsing with Syntax- and Table-Aware SQL Generation , 2018, ACL.

[17] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[18] Raymond J. Mooney,et al. Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[19] Percy Liang,et al. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[20] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.

[21] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[22] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.

[23] Sebastian Nowozin,et al. DeepCoder: Learning to Write Programs , 2016, ICLR.

[24] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[25] Yuchen Zhang,et al. Macro Grammars and Holistic Triggering for Efficient Semantic Parsing , 2017, EMNLP.

[26] Percy Liang,et al. Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[27] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[29] Jayant Krishnamurthy,et al. Neural Semantic Parsing with Type Constraints for Semi-Structured Tables , 2017, EMNLP.

[30] Dawn Xiaodong Song,et al. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[31] Nicolas Le Roux. Tighter bounds lead to improved classifiers , 2017, ICLR.

[32] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[33] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34] Mirella Lapata,et al. Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[35] Luke S. Zettlemoyer,et al. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[36] Percy Liang,et al. Inferring Logical Forms From Denotations , 2016, ACL.

[37] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[38] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40] Po-Sen Huang,et al. Natural Language to Structured Query Generation via Meta-Learning , 2018, NAACL.

[41] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[43] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44] Chen Liang,et al. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[45] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[46] Matthew J. Hausknecht,et al. Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis , 2018, ICLR.

[47] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[48] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[49] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[50] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[51] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[52] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[53] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[54] Tao Yu,et al. TypeSQL: Knowledge-Based Type-Aware Neural Text-to-SQL Generation , 2018, NAACL.

[55] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[56] Octavian-Eugen Ganea,et al. Neural Multi-step Reasoning for Question Answering on Semi-structured Tables , 2017, ECIR.

[57] Radu Soricut,et al. Cold-Start Reinforcement Learning with Softmax Policy Gradient , 2017, NIPS.

[58] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.