Summarization with Highway Condition Radom Pointer-Generator Network

Nowadays, Neural sequence-to-sequence models have become the common approach for abstractive text summarization. In this way, the abstract is closer to the natural language, not just the excerpt and recomposition of the original text. However, these models still have some shortcomings: They can't handle out of vocabulary(OOV) words well and they tend to repeat themselves or become confused. What's more, because of the selfcharacteristics of the recurrent neural network, it begins to degenerate when the depth is low (only two layers), which is not conducive to extracting deeper features to improve the accuracy of the model. In order to solve these problems, we have proposed a Highway Condition Radom Pointer-Generator Network(HCRPGN). We introduce the CRF layer to solve the duplication problem and use highway recurrent cell to optimize the neuron structure and prevent model degradation. We apply our model to the CNN/Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 1 ROUGE points.

[1]  Abdelghani Bellaachia,et al.  Multi-document Hyperedge-based Ranking for Text Summarization , 2014, CIKM.

[2]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[3]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[4]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[5]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[6]  Konstantin Lopyrev,et al.  Generating News Headlines with Recurrent Neural Networks , 2015, ArXiv.

[7]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[11]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[12]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[13]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[14]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[15]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[16]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[17]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[18]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[19]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.