Evaluating Relevance and Reliability of Twitter Data for Risk Communication

....................................................................................................................... ii ACKNOWLEDGMENTS ................................................................................................. iv DEDICATION .................................................................................................................... v LIST OF TABLES .............................................................................................................. x LIST OF ILLUSTRATIONS ............................................................................................. xi LIST OF ABBREVIATIONS ........................................................................................... xii CHAPTER I INTRODUCTION ...................................................................................... 1 1.1 Overview. .................................................................................................................. 1 1.2 Problem Statement. ................................................................................................... 1 1.2.1 Natural Hazards. ................................................................................................ 1 1.2.2 Risk Information. ............................................................................................... 2 1.2.3 Crowdsourcing in Emergency Response. .......................................................... 2 1.2.4 Data Relevance and Reliability. ......................................................................... 4 1.3 Research Objectives and Outcomes. ......................................................................... 5 1.3.1 Objectives. ......................................................................................................... 5 1.3.2 Outcomes. .......................................................................................................... 6 CHAPTER II BACKGROUND ....................................................................................... 7 2.1 Risk Communication. ............................................................................................... 7 2.2 Crowdsourcing. ....................................................................................................... 10 vii 2.3 Crowdsourced Data Quality. ................................................................................... 15 2.3.1 Crowdsourced Data Relevance and Reliability. .............................................. 16 2.4 Techniques for Analyzing Crowdsourced Data. ..................................................... 17 2.4.1 Data Quality Assessment Techniques. ............................................................. 20 2.5 Summary ................................................................................................................. 23 CHAPTER III METHODOLOGY ................................................................................. 25 3.1 Study Site. ............................................................................................................... 25 3.2 Data Sets and Processing. ....................................................................................... 26 3.2.1 Tweets of 2013 Colorado Floods. .................................................................... 26 3.2.1.1 Tweets. ...................................................................................................... 26 3.2.1.2 Tools & Preprocessing. ............................................................................. 27 3.2.2 Geospatial Data. ............................................................................................... 30 3.2.3 Survey Data. ..................................................................................................... 30 3.2.4 NOAA Warning/alert Messages. ..................................................................... 31 3.2.5 Official Warning and Damage Assessment Reports. ....................................... 31 3.3 Analytics and Techniques ....................................................................................... 32 3.3.1 Extraction of Relevant Risk Information: Bag-of-words Model. .................... 32 3.3.2 Survey Responses to Warning/alert Message Content. ................................... 33 3.3.3 Evaluation of Relevance. ................................................................................. 33 3.3.4 Evaluation of Reliability. ................................................................................. 36 viii CHAPTER IV – RESULTS .............................................................................................. 38 4.1 Evaluation of Relevance. ........................................................................................ 38 4.1.1 Temporal Trend of Tweets Volume vs. Precipitation Amount. ....................... 38 4.1.2 Spatial Distribution of Tweets vs. the Degree of Damage. .............................. 39 4.1.3 Spatiotemporal Analysis of Tweets. ................................................................ 42 4.1.4 Content Analysis. ............................................................................................. 44 4.1.5 Cosine Similarity Comparison. ........................................................................ 46 4.1.6 Relevance Score ............................................................................................... 49 4.2 Evaluation of Reliability. ........................................................................................ 50 4.2.1 Evaluation of Text Content. ............................................................................. 50 4.2.2 Evaluation of Image. ........................................................................................ 56 CHAPTER V DISCUSION AND CONCLUSION .......................................................... 60 5.1 Relevance of Tweets to Risk Communication. ....................................................... 60 5.2 Reliability of Tweets to Risk Communication........................................................ 62 5.3 Research Outcomes. ................................................................................................ 62 5.3.1 Implications for Risk Communication. ............................................................ 62 5.3.2 Implications for GIScience. ............................................................................. 64 5.4 Limitations and Future Research. ........................................................................... 65 APPENDIX A Code ....................................................................................................... 69 A.1 MongoDB Code ..................................................................................................... 69 ix A.2 R Code.................................................................................................................... 70 APPENDIX B – Top Frequent Words & Hashtags .......................................................... 72 APPENDIX C – Examples of Identified Road/Streets ..................................................... 77 REFERENCES ................................................................................................................. 78

[1]  Qunying Huang,et al.  Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 Boulder flood case study , 2016 .

[2]  Shari R. Veil,et al.  A Work-in-Process Literature Review: Incorporating Social Media in Risk and Crisis Communication , 2011 .

[3]  Melinda Laituri,et al.  On Line Disaster Response Community: People as Sensors of High Magnitude Disasters Using Internet GIS , 2008, Sensors.

[4]  Pascal Neis,et al.  Quality assessment for building footprints data on OpenStreetMap , 2014, Int. J. Geogr. Inf. Sci..

[5]  J. Fowler,et al.  Rapid assessment of disaster damage using social media activity , 2016, Science Advances.

[6]  Bandana Kar,et al.  Citizen science in risk communication in the era of ICT , 2016, Concurr. Comput. Pract. Exp..

[7]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[8]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[9]  Michael F. Goodchild,et al.  Assuring the quality of volunteered geographic information , 2012 .

[10]  M. Simpson Global Climate Change Impacts in the United States , 2011 .

[11]  Farida Vis,et al.  Twitpic-ing the riots: analysing images shared on Twitter during the 2011 UK riots , 2013 .

[12]  Michelle R. Guy,et al.  Twitter earthquake detection: earthquake monitoring in a social world , 2012 .

[13]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[14]  S. Gorman,et al.  Volunteered Geographic Information and Crowdsourcing Disaster Relief: A Case Study of the Haitian Earthquake , 2010 .

[15]  Melanie Eckle,et al.  Quality Assessment of Remote Mapping in OpenStreetMap for Disaster Management Purposes , 2015, ISCRAM.

[16]  Chris Callison-Burch,et al.  Crowdsourcing for NLP , 2015, NAACL.

[17]  Christoph Perger,et al.  Using control data to determine the reliability of volunteered geographic information about land cover , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[18]  Eric Schenk,et al.  Crowdsourcing: What can be Outsourced to the Crowd, and Why ? , 2009 .

[19]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[20]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[21]  D. Mileti,et al.  The social psychology of public response to warnings of a nuclear power plant accident. , 2000, Journal of hazardous materials.

[22]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[23]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[24]  Ate Poorthuis,et al.  Follow thy neighbor: Connecting the social and the spatial networks on Twitter , 2015, Comput. Environ. Urban Syst..

[25]  Fernando González-Ladrón-de-Guevara,et al.  Towards an integrated crowdsourcing definition , 2012, J. Inf. Sci..

[26]  Anne C. Rouse,et al.  A Preliminary Taxonomy of Crowdsourcing , 2010 .

[27]  Terry W. Cole,et al.  Risk Communication Failure: A Case Study of New Orleans and Hurricane Katrina , 2008 .

[28]  Kathleen M. Carley,et al.  Social Media in Disaster Relief , 2014 .

[29]  Kazutoshi Sumiya,et al.  Discovery of unusual regional social activities using geo-tagged microblogs , 2011, World Wide Web.

[30]  Tomas Holderness du Chemin,et al.  From Social Media to GeoSocial Intelligence: Crowdsourcing Civic Co-management for Flood Response in Jakarta, Indonesia , 2015, Social Media for Government Services.

[31]  L. Bengtsson,et al.  Improved Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake Geospatial Study in Haiti , 2011, PLoS medicine.

[32]  Lutz Frommberger,et al.  Mobile4D: crowdsourced disaster alerting and reporting , 2013, ICTD '13.

[33]  M. Kuttschreuter,et al.  An empirical analysis of communication flow, strategy and stakeholders' participation in the risk communication literature 1988–2000 , 2005 .

[34]  B. Weitz Hosted By , 2003 .

[35]  Alexander Zipf,et al.  The use of Volunteered Geographic Information (VGI) and Crowdsourcing in Disaster Management: a Systematic Literature Review , 2013, AMCIS.

[36]  Huiling Ding,et al.  Social Media and Participatory Risk Communication during the H1N1 Flu Epidemic: A Comparative Study of the United States and China , 2010 .

[37]  Christian Heipke,et al.  Crowdsourcing geospatial data , 2010 .

[38]  Leysia Palen,et al.  Online public communications by police & fire services during the 2012 Hurricane Sandy , 2014, CHI.

[39]  Sam Meek,et al.  A flexible framework for assessing the quality of crowdsourced data , 2014 .

[40]  Dawn G. Gregg,et al.  Utilizing Volunteered Geographic Information to Develop a Real-Time Disaster Mapping Tool: A Prototype and Research Framework , 2012, CONF-IRM.

[41]  Robert Thomson,et al.  Trusting tweets: The Fukushima disaster and information source credibility on Twitter , 2012, ISCRAM.

[42]  Pragya Agarwal,et al.  Ontological considerations in GIScience , 2005, Int. J. Geogr. Inf. Sci..

[43]  Philip Treleaven,et al.  Quantifying the Digital Traces of Hurricane Sandy on Flickr , 2013, Scientific Reports.

[44]  Ammatzia Peled,et al.  Real‐Time Major Events Monitoring and Alert System Through Social Networks , 2015 .

[45]  António Câmara,et al.  Promoting the use of environmental data collected by concerned citizens through information and communication technologies. , 2004, Journal of environmental management.

[46]  Sheldon Krimsky,et al.  Risk communication in the internet age: The rise of disorganized skepticism , 2007 .

[47]  Nigel Harvey,et al.  Trust in motives, trust in competence: Separate factors determining the effectiveness of risk communication , 2008, Judgment and Decision Making.

[48]  L. Duram,et al.  Insights and Applications Assessing Public Participation in U.S. Watershed Planning Initiatives , 1999 .

[49]  Huan Liu,et al.  Promoting Coordination for Disaster Relief - From Crowdsourcing to Coordination , 2011, SBP.

[50]  Arjen P. de Vries,et al.  Obtaining High-Quality Relevance Judgments Using Crowdsourcing , 2012, IEEE Internet Computing.

[51]  Jessika Weiss Risk Communication A Handbook For Communicating Environmental Safety And Health Risks , 2016 .

[52]  L. Palen Online Social Media in Crisis Events. , 2008 .

[53]  Calton Pu,et al.  Social spam, campaigns, misinformation and crowdturfing , 2014, WWW '14 Companion.

[54]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[55]  Nigel Waters,et al.  Road assessment after flood events using non-authoritative data , 2013 .

[56]  Robert Kern,et al.  Dynamic Quality Management for Cloud Labor Services , 2014, Lecture Notes in Business Information Processing.

[57]  Abdulmonem Alabri,et al.  Enhancing the Quality and Trust of Citizen Science Data , 2010, 2010 IEEE Sixth International Conference on e-Science.

[58]  Kate Starbird,et al.  Rumors, False Flags, and Digital Vigilantes: Misinformation on Twitter after the 2013 Boston Marathon Bombing , 2014 .

[59]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2011, Int. J. Inf. Manag..

[60]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[61]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[62]  Ben Sheppard,et al.  Understanding risk communication theory: a guide for emergency managers and communicators. , 2012 .

[63]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[64]  Mahmoud Reza Delavar,et al.  A Quality Study of the OpenStreetMap Dataset for Tehran , 2014, ISPRS Int. J. Geo Inf..

[65]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[66]  J T Kelly,et al.  Assessing quality. , 1988, JAMA.

[67]  Leysia Palen,et al.  From Crowdsourced Mapping to Community Mapping: The Post-earthquake Work of OpenStreetMap Haiti , 2014, COOP.

[68]  Leysia Palen,et al.  Mastering social media: An analysis of Jefferson County's communications during the 2013 Colorado floods , 2014, ISCRAM.

[69]  Vincent T. Covello,et al.  Risk Communication: An Emerging Area of Health Communication Research , 1992 .

[70]  Greg Brown,et al.  Is PPGIS good enough? An empirical evaluation of the quality of PPGIS crowd-sourced spatial data for conservation planning , 2015 .

[71]  Akemi Takeoka Chatfield,et al.  Crowdsourcing Hazardous Weather Reports from Citizens via Twittersphere under the Short Warning Lead Times of EF5 Intensity Tornado Conditions , 2014, 2014 47th Hawaii International Conference on System Sciences.

[72]  Steven Thompson,et al.  Improving disaster response efforts with decision support systems , 2006 .

[73]  Michael F. Goodchild,et al.  Twenty years of progress: GIScience in 2010 , 2010, J. Spatial Inf. Sci..

[74]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[75]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[76]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[77]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[78]  Steve Kelling,et al.  Taking a ‘Big Data’ approach to data quality in a citizen science project , 2015, Ambio.

[79]  Deepak Khazanchi,et al.  Crowdsourcing Typology: A Review of IS Research and Organizations , 2014 .

[80]  João Porto de Albuquerque,et al.  Flood Citizen Observatory: a crowdsourcing-based approach for flood risk management in Brazil , 2014, SEKE.

[81]  Qinghua Zhu,et al.  Evaluation on crowdsourcing research: Current status and future direction , 2012, Information Systems Frontiers.

[82]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[83]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[84]  Ivan Beschastnikh,et al.  SPRUCE: A System for Supporting Urgent High-Performance Computing , 2006, Grid-Based Problem Solving Environments.

[85]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[86]  Jie Li,et al.  Rethinking big data: A review on the data quality and usage issues , 2016 .

[87]  Christopher Cheong,et al.  Social Media Data Mining: A Social Network Analysis Of Tweets During The 2010-2011 Australian Floods , 2011, PACIS.

[88]  Hansi Senaratne,et al.  A review of volunteered geographic information quality assessment methods , 2017, Int. J. Geogr. Inf. Sci..

[89]  Vincent T. Covello,et al.  Effective risk communication : the role and responsibility of government and nongovernment organizations , 1989 .

[90]  Susan L. Cutter,et al.  Hazards, Vulnerability and Environmental Justice , 2006 .

[91]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[92]  Ponnurangam Kumaraguru,et al.  Credibility ranking of tweets during high impact events , 2012, PSOSM '12.

[93]  Tobias Hoßfeld,et al.  Analyzing costs and accuracy of validation mechanisms for crowdsourcing platforms , 2013, Math. Comput. Model..

[94]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[95]  C. Wendling,et al.  The Use of Social Media in Risk and Crisis Communication , 2013 .

[96]  Leysia Palen,et al.  Pass it on?: Retweeting in mass emergency , 2010, ISCRAM.

[97]  Steffen Fritz,et al.  The Rise of Collaborative Mapping: Trends and Future Directions , 2013, ISPRS Int. J. Geo Inf..

[98]  Krishna P. Gummadi,et al.  Geographic Dissection of the Twitter Network , 2012, ICWSM.

[99]  Christopher E. Oxendine,et al.  Using Non-authoritative Sources During Emergencies in Urban Areas , 2015 .