Neural Headline Generation with Minimum Risk Training

Automatic headline generation is an important research area within text summarization and sentence compression. Recently, neural headline generation models have been proposed to take advantage of well-trained neural networks in learning sentence representations and mapping sequence to sequence. Nevertheless, traditional neural network encoder utilizes maximum likelihood estimation for parameter optimization, which essentially constraints the expected training objective within word level instead of sentence level. Moreover, the performance of model prediction significantly relies on training data distribution. To overcome these drawbacks, we employ minimum risk training strategy in this paper, which directly optimizes model parameters with respect to evaluation metrics and statistically leads to significant improvements for headline generation. Experiment results show that our approach outperforms state-of-the-art systems on both English and Chinese headline generation tasks.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[6]  Richard M. Schwartz,et al.  BBN/UMD at DUC-2004: Topiary , 2004 .

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[9]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[10]  Songhua Xu,et al.  Keyword Extraction and Headline Generation Using Novel Word Features , 2010, Proceedings of the AAAI Conference on Artificial Intelligence.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[15]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[16]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[17]  Qingcai Chen,et al.  LCSTS: A Large Scale Chinese Short Text Summarization Dataset , 2015, EMNLP.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[20]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.