A Layered Approach for Summarization and Context Learning from Microblogging Data

Twitter, a microblogging online social network, is one of the most popular information sharing and communication platform. The large user-base and users mutual interactions generate massive amount of data that are rich source of information for predictive modeling, sentiment analysis, opinion mining, and other text information processing tasks. Understanding context embedded within text corpus and generating a contextual summary of the corpus is one of the promising research directions in the field of data analytics. In this paper, we present a layered graph-based approach using both content and structural data to analyze and summarize tweets at different levels of granularity. The proposed approach models tweets as a multi-dimensional graph and applies random walk to identify most informative tweets, which are further processed using a graph-theoretic approach, LexRank, to identify most informative sentences for summary generation. Finally, the summary texts are analyzed using TextRank algorithm to identify prominent keywords conceptualizing the context of the underlying corpus. The proposed summary generation and context learning approach is evaluated over four different real-world Twitter datasets using standard information retrieval metrics.

[1]  Monika Westphal,et al.  Customer Sentiment in Web-Based Service Interactions: Automated Analyses and New Insights , 2018, WWW.

[2]  Muhammad Abulaish,et al.  HOCTracker: Tracking the Evolution of Hierarchical and Overlapping Communities in Dynamic Social Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[6]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[7]  M. Abulaish,et al.  A SUPERVISED LEARNING APPROACH FOR AUTOMATIC KEYPHRASE EXTRACTION , 2012 .

[8]  Muhammad Abulaish,et al.  A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data , 2013, Int. J. Adapt. Resilient Auton. Syst..

[9]  Xiao Weidong Subject Sentence Extraction Based on Undirected Graph Construction , 2011 .

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[12]  Carmen Banea,et al.  Random-Walk Term Weighting for Improved Text Classification , 2006 .

[13]  Rada Mihalcea,et al.  Random-Walk Term Weighting for Improved Text Classification , 2006, International Conference on Semantic Computing (ICSC 2007).

[14]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[15]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[16]  Muhammad Abulaish,et al.  Statistical Features Identification for Sentiment Analysis Using Machine Learning Techniques , 2013, 2013 International Symposium on Computational and Business Intelligence.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Muhammad Abulaish,et al.  Twitter Data Mining for Events Classification and Analysis , 2015, 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI).

[19]  Lipika Dey,et al.  A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora , 2010, J. Biomed. Informatics.

[20]  James Caverlee,et al.  Summarizing User-Contributed Comments , 2011, ICWSM.

[21]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[22]  Xirong Li,et al.  Mapping Query to Semantic Concepts: Leveraging Semantic Indices for Automatic and Interactive Video Retrieval , 2007 .