MS-Pointer Network: Abstractive Text Summary Based on Multi-Head Self-Attention

Abstractive text summarization plays an important role in the field of natural language processing. However, the abstractive text summary adopts deep learning research method to predict words often appears semantic inaccuracy and repetition and so on. at the present stage, in order to solve the problem that semantic inaccuracy, we propose an MS-Pointer Network that based on the multi-head self-attention mechanism, which a multi-head self-attention mechanism is introduced in the basic encoder-decoder model. Since multi-head self-attention can combine input words into the encoder-decoder arbitrarily, and given a higher weight of these words that combination of the semantics, thereby achieving the purpose of enhancing the semantic features of the text, so that the abstractive text summary is more semantically structured, And the multi-head self-attention mechanism add the position information of the input text, which can enhance the semantic representation of the text. At the same time, in order to solve the problem of out of vocabulary, a pointer network is introduced on the seqtoseq with a multi-head attention mechanism. The model is referred to as MS-Pointer Network. We used CNN/Daily Mail and Gigaword datasets to validate our model, and uses the ROUGE metric to measure model. Experiments have shown that abstractive text summaries generated using the multi-head self-attention mechanism outperforming current open state-of-the-art two points averagely.

[1]  Juhan Nam,et al.  Learning Sparse Feature Representations for Music Annotation and Retrieval , 2012, ISMIR.

[2]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[3]  Christos Mousas,et al.  Generative Adversarial Network with Policy Gradient for Text Summarization , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[4]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[5]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[6]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[10]  Christos Mousas,et al.  Learning Motion Features for Example-Based Finger Motion Estimation for Virtual Characters , 2017 .

[11]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[12]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[13]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[14]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[15]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[18]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[20]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[21]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[22]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[23]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[24]  Haitao Huang,et al.  Abstractive text summarization using LSTM-CNN based deep learning , 2018, Multimedia Tools and Applications.

[25]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[26]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[29]  Rasim M. Alguliyev,et al.  COSUM: Text summarization based on clustering and optimization , 2018, Expert Syst. J. Knowl. Eng..

[30]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[31]  Sungzoon Cho,et al.  Distance-based Self-Attention Network for Natural Language Inference , 2017, ArXiv.

[32]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[33]  Horacio Saggion,et al.  A text summarization method based on fuzzy rules and applicable to automated assessment , 2019, Expert Syst. Appl..

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.