Topic-driven reader comments summarization

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking mechanisms for comments (e.g., by recency or by user rating) fail to offer an overall picture of topics discussed in comments. In this paper, we first propose to study Topic-driven Reader Comments Summarization (Torcs) problem. We observe that many news articles from a news stream are related to each other; so are their comments. Hence, news articles and their associated comments provide context information for user commenting. To implicitly capture the context information, we propose two topic models to address the Torcs problem, namely, Master-Slave Topic Model (MSTM) and Extended Master-Slave Topic Model (EXTM). Both models treat a news article as a master document and each of its comments as a slave document. MSTM model constrains that the topics discussed in comments have to be derived from the commenting news article. On the other hand, EXTM model allows generating words of comments using both the topics derived from the commenting news article, and the topics derived from all comments themselves. Both models are used to group comments into topic clusters. We then use two ranking mechanisms Maximal Marginal Relevance (MMR) and Rating & Length (RL) to select a few most representative comments from each comment cluster. To evaluate the two models, we conducted experiments on 1005 Yahoo! News articles with more than one million comments. Our experimental results show that EXTM significantly outperforms MSTM by perplexity. Through a user study, we also confirm that the comment summary generated by EXTM achieves better intra-cluster topic cohesion and inter-cluster topic diversity.

[1]  Mengen Chen,et al.  Short Text Classification Improved by Learning Multi-Granularity Topics , 2011, IJCAI.

[2]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[3]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[4]  Subhajit Sanyal,et al.  Multi-objective ranking of comments on web , 2012, WWW.

[5]  James Caverlee,et al.  Summarizing User-Contributed Comments , 2011, ICWSM.

[6]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[7]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[8]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[9]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[10]  Yehuda Koren,et al.  Care to comment?: recommendations for commenting on news stories , 2012, WWW.

[11]  James Caverlee,et al.  Analyzing and Predicting Community Preference of Socially Generated Metadata: A Case Study on Comments in the Digg Community , 2009, ICWSM.

[12]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[13]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[14]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[15]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[16]  Andrea Marino,et al.  Topical clustering of search results , 2012, WSDM '12.

[17]  Zhoujun Li,et al.  The topic-perspective model for social tagging systems , 2010, KDD.

[18]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[20]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[21]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[22]  Ravi Kant,et al.  Comment spam detection by sequence mining , 2012, WSDM '12.

[23]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[24]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[25]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[26]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[27]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[30]  James Allan,et al.  Topic Detection and Tracking , 2002, The Information Retrieval Series.

[31]  Nuria Oliver,et al.  Leveraging user comments for aesthetic aware image search reranking , 2012, WWW.

[32]  Ophir Frieder,et al.  Are Web User Comments Useful for Search? , 2009, LSDS-IR@SIGIR.

[33]  Hans-Peter Kriegel,et al.  Hierarchical Bayesian Models for Collaborative Tagging Systems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[34]  Wolfgang Nejdl,et al.  How useful are your comments?: analyzing and predicting youtube comments and comment ratings , 2010, WWW '10.

[35]  Ramesh C. Jain,et al.  Content without context is meaningless , 2010, ACM Multimedia.

[36]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[37]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[38]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.