Using semantic graphs to detect overlapping target events and story lines from newspaper articles

Event detection from text data is an active area of research. While the emphasis in the literature has been on event identification and labeling using a single data source, this work considers event and story line detection when using a large number of data sources. In this setting, it is natural for different events in the same domain, e.g., violence, sports, politics, to occur at the same time and for different story lines about the same event to emerge. To capture events in this setting, we propose an Offline algorithm that detects events and story lines about events for a target domain given a news article collection. Our algorithm leverages a multi-relational sentence-level semantic graph and well-known graph properties to identify overlapping events and story lines within the events. We then extend this algorithm for an Online setting. Both the Offline and Online approaches are evaluated using two large data sets containing millions of news articles from a large number of sources. Our empirical analysis shows that methods using the proposed semantic graph beat the state of the art in terms of precision and recall while providing more complete event summaries.

[1]  Dawei Wang,et al.  A Hierarchical Pattern Learning Framework for Forecasting Extreme Weather Events , 2015, 2015 IEEE International Conference on Data Mining.

[2]  Hans Uszkoreit,et al.  Automatic Event and Relation Detection with Seeds of Varying Complexity , 2006 .

[3]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[4]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[5]  Yoko Nishihara,et al.  Event Extraction and Visualization for Obtaining Personal Experiences from Blogs , 2009, HCI.

[6]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[7]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[8]  Dimitrios Gunopulos,et al.  On burstiness-aware search for document sequences , 2009, KDD.

[9]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[10]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[11]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[12]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[14]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[16]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[17]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[18]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[19]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[20]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[21]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[22]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[23]  Christos Faloutsos,et al.  Monitoring Network Evolution using MDL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[25]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR Forum.

[26]  Lisa Singh,et al.  Overlapping Target Event and Story Line Detection of Online Newspaper Articles , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[27]  Aravind Srinivasan,et al.  'Beating the news' with EMBERS: forecasting civil unrest using open source indicators , 2014, KDD.

[28]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[29]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[30]  Naren Ramakrishnan,et al.  Planned Protest Modeling in News and Social Media , 2015, AAAI.

[31]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[32]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[33]  Jiawei Han,et al.  Mining Multi-aspect Reflection of News Events in Twitter: Discovery, Linking and Presentation , 2015, 2015 IEEE International Conference on Data Mining.

[34]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[35]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[36]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[37]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[38]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.