Concept Drift Adaptive Physical Event Detection for Social Media Streams

Event detection has long been the domain of physical sensors operating in a static dataset assumption. The prevalence of social media and web access has led to the emergence of social, or human sensors who report on events globally. This warrants development of event detectors that can take advantage of the truly dense and high spatial and temporal resolution data provided by more than 3 billion social users. The phenomenon of concept drift, which causes terms and signals associated with a topic to change over time, renders static machine learning ineffective. Towards this end, we present an application for physical event detection on social sensors that improves traditional physical event detection with concept drift adaptation. Our approach continuously updates its machine learning classifiers automatically, without the need for human intervention. It integrates data from heterogeneous sources and is designed to handle weak-signal events (landslides, wildfires) with around ten posts per event in addition to large-signal events (hurricanes, earthquakes) with hundreds of thousands of posts per event. We demonstrate a landslide detector on our application that detects almost 350% more landslides compared to static approaches. Our application has high performance: using classifiers trained in 2014, achieving event detection accuracy of 0.988, compared to 0.762 for static approaches.

[1]  Marcus A. Maloof,et al.  Paired Learners for Concept Drift , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[3]  Hideo Hirose,et al.  Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[4]  Luiz Eduardo Soares de Oliveira,et al.  Adapting dynamic classifier selection for concept drift , 2018, Expert Syst. Appl..

[5]  Brenden Jongman,et al.  Early Flood Detection for Rapid Humanitarian Response: Harnessing Near Real-Time Satellite and Twitter Signals , 2015, ISPRS Int. J. Geo Inf..

[6]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[7]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[8]  Hang Zhang,et al.  Online Active Learning Ensemble Framework for Drifted Data Streams , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[9]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[10]  Keqin Li,et al.  Knowledge-maximized ensemble algorithm for different types of concept drift , 2018, Inf. Sci..

[11]  Calton Pu,et al.  LITMUS: Landslide detection by integrating multiple sources , 2014, ISCRAM.

[12]  Kenneth Ward Church,et al.  Word2Vec , 2016, Natural Language Engineering.

[13]  Mauricio Santillana,et al.  Accurate estimation of influenza epidemics using Google search data via ARGO , 2015, Proceedings of the National Academy of Sciences.

[14]  Calton Pu,et al.  Fast Text Classification Using Randomized Explicit Semantic Analysis , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[15]  Yukiko Kawai,et al.  Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study , 2018, JMIR public health and surveillance.

[16]  Heiko Wersing,et al.  Mitigating Concept Drift via Rejection , 2018, ICANN.

[17]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[18]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[19]  Eric P. Xing,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[20]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[21]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[22]  Thomas Ertl,et al.  Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages , 2012, 2012 IEEE Pacific Visualization Symposium.