ETree: Effective and Efficient Event Modeling for Real-Time Online Social Media Networks

Outline social media networks (OSMNs) such as Twitter provide great opportunities for public engagement and event information dissemination. Event-related discussions occur in real time and at the worldwide scale. However, these discussions are in the form of short, unstructured messages and dynamically woven into daily chats and status updates. Compared with traditional news articles, the rich and diverse user-generated content raises unique new challenges for tracking and analyzing events. Effective and efficient event modeling is thus essential for real-time information-intensive OSMNs. In this work, we propose ETree, an effective and efficient event modeling solution for social media network sites. Targeting the unique challenges of this problem, ETree consists of three key components: (1) an n-gram based content analysis technique for identifying core information blocks from a large number of short messages, (2) an incremental and hierarchical modeling technique for identifying and constructing event theme structures at different granularities, and (3) an enhanced temporal analysis technique for identifying inherent causalities between information blocks. Detailed evaluation using 3.5 million tweets over a 5-month period demonstrates that ETree can efficiently generate high-quality event structures and identify inherent causal relationships with high accuracy.

[1]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[2]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[3]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[4]  Edward A. Fox,et al.  Research Contributions , 2014 .

[5]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[6]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[7]  Xindong Wu,et al.  News Filtering and Summarization on the Web , 2010, IEEE Intelligent Systems.

[8]  Philip S. Yu,et al.  Time-dependent event hierarchy construction , 2007, KDD '07.

[9]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[10]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[11]  Giuseppe Carenini,et al.  Summarizing email conversations with clue words , 2007, WWW '07.

[12]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[13]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[14]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[15]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[16]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[17]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[18]  Bettina Berendt,et al.  Web Mining for Understanding Stories through Graph Visualisation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[20]  Dolf Trieschnigg,et al.  Hierarchical topic detection in large digital news archives: Exploring a sample based approach , 2005, J. Digit. Inf. Manag..

[21]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[22]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[23]  Chien Chin Chen,et al.  TSCAN: a novel method for topic summarization and content anatomy , 2008, SIGIR '08.

[24]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[25]  John Yen,et al.  An incremental approach to building a cluster hierarchy , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[26]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.