Aligning Gaussian-Topic with Embedding Network for Summarization Ranking

Query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary is to construct quality sentences both for user specific queries and multiple documents. Sentence embedding has been shown effective in summarization tasks. However, these methods lack of the latent topic structure of contents. Hence, the summary lies only on vector space can hardly capture multi-topical content. In this paper, our proposed model incorporates the topical aspects and continuous vector representations, which jointly learns semantic rich representations encoded by vectors. Then, leveraged by topic filtering and embedding ranking model, the summarization can select desirable salient sentences. Experiments demonstrate outstanding performance of our proposed model from the perspectives of prominent topics and semantic coherence.

[1]  Larry P. Heck,et al.  Contextual LSTM (CLSTM) models for Large scale NLP tasks , 2016, ArXiv.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[4]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[5]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[6]  Daraksha Parveen,et al.  Topical Coherence for Graph-based Extractive Summarization , 2015, EMNLP.

[7]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[8]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[9]  Yan Liu,et al.  Query-Oriented Multi-Document Summarization via Unsupervised Deep Learning , 2012, AAAI.

[10]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[11]  Hsin-Hsi Chen,et al.  Leveraging word embeddings for spoken document summarization , 2015, INTERSPEECH.

[12]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[13]  Min Yang,et al.  Ordering-Sensitive and Semantic-Aware Topic Modeling , 2015, AAAI.

[14]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.

[15]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[16]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[17]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[18]  Ani Nenkova,et al.  Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[19]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[20]  Hayato Kobayashi,et al.  Summarization Based on Embedding Distributions , 2015, EMNLP.

[21]  Michel Galley,et al.  A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance , 2006, EMNLP.

[22]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Wenpeng Yin,et al.  Optimizing Sentence Modeling and Selection for Document Summarization , 2015, IJCAI.

[27]  Joshua Goodman,et al.  Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.