DISTL: Distributed In-Memory Spatio-Temporal Event-based Storyline Categorization Platform in Social Media

Event analysis in social media is challenging due to endless amount of information generated daily. While current research has put a strong focus on detecting events, there is no clear guidance on how those storylines should be processed such that they would make sense to a human analyst. In this paper, we present DISTL, an event processing platform which takes as input a set of storylines (a sequence of entities and their relationships) and processes them as follows: (1) uses different algorithms (LDA, SVM, information gain, rule sets) to identify events with different themes and allocates storylines to them; and (2) combines the events with location and time to narrow down to the ones that are meaningful in a specific scenario. The output comprises sets of events in different categories. DISTL uses in-memory distributed processing that scales to high data volumes and categorizes generated storylines in near real-time. It uses Big Data tools, such as Hadoop and Spark, which have shown to be highly efficient in handling millions of tweets concurrently.

[1]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[2]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[3]  Liang Zhao,et al.  Spatiotemporal Event Forecasting in Social Media , 2015, SDM.

[4]  Chang-Tien Lu,et al.  DISCRN: A Distributed Storytelling Framework for Intelligence Analysis , 2017, Big Data.

[5]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[6]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[7]  Lars Schmidt-Thieme,et al.  Scalable Event-Based Clustering of Social Media Via Record Linkage Techniques , 2011, ICWSM.

[8]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[9]  Eva Blomqvist,et al.  Semantic Complex Event Processing for Social Media Monitoring-A Survey , 2013 .

[10]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[11]  Eric Horvitz,et al.  Mining the web to predict future events , 2013, WSDM.

[12]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[13]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[14]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[15]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[17]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[18]  Craig MacDonald,et al.  Can Twitter Replace Newswire for Breaking News? , 2013, ICWSM.

[19]  David S. Ebert,et al.  Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Hermann Hellwagner,et al.  Automatic sub-event detection in emergency management using social media , 2012, WWW.

[22]  Chang-Tien Lu,et al.  Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling , 2014, PloS one.