Twitter Summarization Based on Social Network and Sparse Reconstruction

With the rapid growth of microblogging services, such as Twitter, a vast of short and noisy messages are produced by millions of users, which makes people difficult to quickly grasp essential information of their interested topics. In this paper, we study extractive topic-oriented Twitter summarization as a solution to address this problem. Traditional summarization methods only consider text information, which is insufficient in social media situation. Existing Twitter summarization techniques rarely explore relations between tweets explicitly, ignoring that information can spread along the social network. Inspired by social theories that expression consistence and expression contagion are observed in social network, we propose a novel approach for Twitter summarization in short and noisy situation by integrating Social Network and Sparse Reconstruction (SNSR). We explore whether social relations can help Twitter summarization, modeling relations between tweets described as the social regularization and integrating it into the group sparse optimization framework. It conducts a sparse reconstruction process by selecting tweets that can best reconstruct the original tweets in a specific topic, with considering coverage and sparsity. We simultaneously design the diversity regularization to remove redundancy. In particular, we present a mathematical optimization formulation and develop an efficient algorithm to solve it. Due to the lack of public corpus, we construct the gold standard twitter summary datasets for 12 different topics. Experimental results on this datasets show the effectiveness of our framework for handling the large scale short and noisy messages in social media.

[1]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[2]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[3]  Ee-Peng Lim,et al.  Influentials, Novelty, and Social Contagion: The Viral Power of Average Friends, Close Communities, and Old News , 2012, Soc. Networks.

[4]  Guoyong Cai,et al.  Exploring Social Context for Topic Identification in Short and Noisy Texts , 2015, AAAI.

[5]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6]  Xiaojun Wan,et al.  Compressive Document Summarization via Sparse Optimization , 2015, IJCAI.

[7]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[8]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[9]  Jieping Ye,et al.  Sparse methods for biomedical data , 2012, SKDD.

[10]  Piji Li,et al.  Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization , 2017, AAAI.

[11]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[12]  Yannis Sismanis,et al.  Scalable topic-specific influence analysis on microblogs , 2014, WSDM.

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Yitong Li,et al.  Graph-Based Multi-Tweet Summarization using Social Signals , 2012, COLING.

[15]  Marina Litvak,et al.  Improving Summarization Quality with Topic Modeling , 2015, TM@CIKM.

[16]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[17]  He Liu,et al.  Multi-Document Summarization Based on Two-Level Sparse Representation Model , 2015, AAAI.

[18]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[19]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[20]  R. Abelson Whatever Became of Consistency Theory? , 1983 .

[21]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[22]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[23]  Chris H. Q. Ding,et al.  Integrating Clustering and Multi-Document Summarization by Bi-Mixture Probabilistic Latent Semantic Analysis (PLSA) with Sentence Bases , 2011, AAAI.

[24]  Chun Chen,et al.  Document Summarization Based on Data Reconstruction , 2012, AAAI.

[25]  Wenjie Li,et al.  Simultaneous Ranking and Clustering of Sentences: A Reinforcement Approach to Multi-Document Summarization , 2010, COLING.

[26]  Cosma Rohilla Shalizi,et al.  Homophily and Contagion Are Generically Confounded in Observational Social Network Studies , 2010, Sociological methods & research.

[27]  James R. Foulds,et al.  HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades , 2015, ICML.

[28]  Harry Shum,et al.  Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality , 2012, COLING.

[29]  Yan Liu,et al.  Towards Twitter context summarization with user influence models , 2013, WSDM.

[30]  Omer F. Rana,et al.  Automatic Summarization of Real World Events Using Twitter , 2016, ICWSM.

[31]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[32]  Sun Park,et al.  Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization , 2007, SOFSEM.

[33]  Yan Liu,et al.  Timeline Summarization from Social Media with Life Cycle Models , 2016, IJCAI.

[34]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.