论文信息 - A Neural Attention Model for Abstractive Sentence Summarization - 字舞流文

A Neural Attention Model for Abstractive Sentence Summarization

Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.

Jason Weston | Alexander M. Rush | Sumit Chopra | J. Weston | S. Chopra

[1] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2] Michele Banko,et al. Headline Generation Based on Statistical Translation , 2000, ACL.

[3] Hongyan Jing,et al. Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[4] Daniel Marcu,et al. A Noisy-Channel Model for Document Compression , 2002, ACL.

[5] Daniel Marcu,et al. Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[6] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[7] Richard M. Schwartz,et al. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[8] Richard M. Schwartz,et al. BBN/UMD at DUC-2004: Topiary , 2004 .

[9] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[11] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[12] Paul Over,et al. DUC in context , 2007, Inf. Process. Manag..

[13] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14] J. Clarke,et al. Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[15] Mirella Lapata,et al. Sentence Compression Beyond Word Deletion , 2008, COLING.

[16] Omar Zaidan,et al. Z-MERT: A Fully Configurable Open Source Tool for Minimum Error Rate Training of Machine Translation Systems , 2009, Prague Bull. Math. Linguistics.

[17] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[19] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[20] Benjamin Van Durme,et al. Annotated Gigaword , 2012, AKBC-WEKEX@NAACL-HLT.

[21] Emiel Krahmer,et al. Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[22] Yasemin Altun,et al. Overcoming the Lack of Parallel Data in Sentence Compression , 2013, EMNLP.

[23] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[24] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[28] Lukasz Kaiser,et al. Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.