Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

[1]  Stefan Riezler,et al.  Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning , 2018, ACL.

[2]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[3]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[4]  Sadid A. Hasan,et al.  Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning , 2014, EMNLP.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Yuxiang Wu,et al.  Learning to Extract Coherent Summary via Deep Reinforcement Learning , 2018, AAAI.

[9]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[10]  Iryna Gurevych,et al.  Objective Function Learning to Match Human Judgements for Optimization-Based Summarization , 2018, NAACL.

[11]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[12]  Percy Liang,et al.  The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.

[13]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[14]  Iryna Gurevych,et al.  APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning , 2018, EMNLP.

[15]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18]  Hal Daumé,et al.  Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback , 2017, EMNLP.

[19]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Po-Sen Huang,et al.  Discourse-Aware Neural Rewards for Coherent Text Generation , 2018, NAACL.

[22]  Matthias Grossglauser,et al.  Just Sort It! A Simple and Effective Approach to Active Preference Learning , 2015, ICML.

[23]  Houfeng Wang,et al.  Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Michael Collins,et al.  Maximum Margin Ranking Algorithms for Information Retrieval , 2010, ECIR.

[26]  Nolan Wagener,et al.  Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[27]  Judith Eckle-Kohler,et al.  A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments , 2017, ACL.

[28]  M. de Rijke,et al.  Sentence Relations for Extractive Summarization with Deep Neural Networks , 2018, ACM Trans. Inf. Syst..