论文信息 - Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation - 字舞流文

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Iryna Gurevych | Christian M. Meyer | Mohsen Mesgar | Yang Gao | Iryna Gurevych | Mohsen Mesgar | Yang Gao | Yang Gao

[1] Stefan Riezler,et al. Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning , 2018, ACL.

[2] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[3] Mirella Lapata,et al. Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[4] Sadid A. Hasan,et al. Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning , 2014, EMNLP.

[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[6] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8] Yuxiang Wu,et al. Learning to Extract Coherent Summary via Deep Reinforcement Learning , 2018, AAAI.

[9] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[10] Iryna Gurevych,et al. Objective Function Learning to Match Human Judgements for Optimization-Based Summarization , 2018, NAACL.

[11] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[12] Percy Liang,et al. The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.

[13] Mirella Lapata,et al. Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[14] Iryna Gurevych,et al. APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning , 2018, EMNLP.

[15] Hang Li,et al. A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[17] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18] Hal Daumé,et al. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback , 2017, EMNLP.

[19] Benoit Favre,et al. A Scalable Global Model for Summarization , 2009, ILP 2009.

[20] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21] Po-Sen Huang,et al. Discourse-Aware Neural Rewards for Coherent Text Generation , 2018, NAACL.

[22] Matthias Grossglauser,et al. Just Sort It! A Simple and Effective Approach to Active Preference Learning , 2015, ICML.

[23] Houfeng Wang,et al. Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[24] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25] Michael Collins,et al. Maximum Margin Ranking Algorithms for Information Retrieval , 2010, ECIR.

[26] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[27] Judith Eckle-Kohler,et al. A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments , 2017, ACL.

[28] M. de Rijke,et al. Sentence Relations for Extractive Summarization with Deep Neural Networks , 2018, ACM Trans. Inf. Syst..