Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback

Counterfactual learning from human bandit feedback describes a scenario where user feedback on the quality of outputs of a historic system is logged and used to improve a target system. We show how to apply this learning framework to neural semantic parsing. From a machine learning perspective, the key challenge lies in a proper reweighting of the estimator so as to avoid known degeneracies in counterfactual learning, while still being applicable to stochastic gradient optimization. To conduct experiments with human users, we devise an easy-to-use interface to collect human feedback on semantic parses. Our work is the first to show that semantic parsers can be improved significantly by counterfactual learning from logged human feedback data.

[1]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Stefan Riezler,et al.  Counterfactual Learning from Bandit Feedback under Deterministic Logging : A Case Study in Statistical Machine Translation , 2017, EMNLP.

[3]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Ming-Wei Chang,et al.  Maximum Margin Reward Networks for Learning from Explicit and Implicit Supervision , 2017, EMNLP.

[6]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[7]  Ming-Wei Chang,et al.  The Value of Semantic Parse Labeling for Knowledge Base Question Answering , 2016, ACL.

[8]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[9]  Thorsten Joachims,et al.  Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..

[10]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[11]  Stefan Riezler,et al.  NLmaps: A Natural Language Interface to Query OpenStreetMap , 2016, COLING.

[12]  Hang Li,et al.  Coupling Distributed and Symbolic Execution for Natural Language Queries , 2016, ICML.

[13]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[14]  Stefan Riezler,et al.  A Corpus and Semantic Parser for Multilingual Natural Language Querying of OpenStreetMap , 2016, NAACL.

[15]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[16]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[17]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[18]  Stefan Riezler,et al.  Counterfactual Learning for Machine Translation: Degeneracies and Solutions , 2017, ArXiv.

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  M. de Rijke,et al.  Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[21]  Chen Liang,et al.  Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[22]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[23]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Martín Abadi,et al.  Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[26]  Alvin Cheung,et al.  Learning a Neural Semantic Parser from User Feedback , 2017, ACL.

[27]  P. Green On Use of the EM Algorithm for Penalized Likelihood Estimation , 1990 .

[28]  A. Preliminaries Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .

[29]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[30]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.