Tracing the German centennial flood in the stream of tweets: first lessons learned

Social microblogging services such as Twitter result in massive streams of georeferenced messages and geolocated status updates. This real-time source of information is invaluable for many application areas, in particular for disaster detection and response scenarios. Consequently, a considerable number of works has dealt with issues of their acquisition, analysis and visualization. Most of these works not only assume an appropriate percentage of georeferenced messages that allows for detecting relevant events for a specific region and time frame, but also that these geolocations are reasonably correct in representing places and times of the underlying spatio-temporal situation. In this paper, we review these two key assumption based on the results of applying a visual analytics approach to a dataset of georeferenced Tweets from Germany over eight months witnessing several large-scale flooding situations throughout the country. Our results confirm the potential of Twitter as a distributed 'social sensor' but at the same time highlight some caveats in interpreting immediate results. To overcome these limits we explore incorporating evidence from other data sources including further social media and mobile phone network metrics to detect, confirm and refine events with respect to location and time. We summarize the lessons learned from our initial analysis by proposing recommendations and outline possible future work directions.

[1]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[2]  Harith Alani,et al.  Alleviating Data Sparsity for Twitter Sentiment Analysis , 2012, #MSM.

[3]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[4]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[5]  Gennady L. Andrienko,et al.  A visual analytics framework for spatio-temporal analysis and modelling , 2013, Data Mining and Knowledge Discovery.

[6]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[7]  Thomas Ertl,et al.  Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages , 2012, 2012 IEEE Pacific Visualization Symposium.

[8]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[9]  Gennady L. Andrienko,et al.  Discovering bits of place histories from people's activity traces , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[10]  Thomas Ertl,et al.  Inverse Document Density: A Smooth Measure for Location-Dependent Term Irregularities , 2012, COLING.

[11]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[12]  David S. Ebert,et al.  Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[13]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[14]  Marcia K. Johnson,et al.  Reality Monitoring , 2005 .

[15]  Anthony K. H. Tung,et al.  Locating mapped resources in Web 2.0 , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Natalia Adrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011 .

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[19]  Gennady L. Andrienko,et al.  Spatial Generalization and Aggregation of Massive Movement Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[20]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.