A Hybrid Topic Model for Multi-Document Summarization

SUMMARY Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition model. Based on the topic transition model, regular topic transition probability is used during generating summary. This approach eliminates the topic independence assumption in the Latent Dirichlet Allocation (LDA) model. Meanwhile, the results of experiments show the advantage of the combination of the three kinds of models. This paper includes alleviating topic independency, and integrating surface texture and shallow semantic in documents to improve summarization. In short, this paper attempts to realize an advanced summarization system.

[1]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[2]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[6]  Dong-Hong Ji,et al.  Context-Enhanced Personalized Social Summarization , 2012, COLING.

[7]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[8]  ChengXiang Zhai,et al.  Structural Topic Model for Latent Topical Structure Analysis , 2011, ACL.

[9]  Jen-Tzung Chien,et al.  Latent Dirichlet learning for document summarization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[11]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[12]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[13]  Wenpeng Yin,et al.  A Supervised Aggregation Framework for Multi-Document Summarization , 2012, COLING.

[14]  Anna Korhonen,et al.  Using Argumentative Zones for Extractive Summarization of Scientific Articles , 2012, COLING.

[15]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[16]  Dilek Z. Hakkani-Tür,et al.  Discovery of Topically Coherent Sentences for Extractive Summarization , 2011, ACL.

[17]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[20]  Rahul Khanna,et al.  Hidden Markov Model , 2015 .

[21]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.