Automated spatiotemporal and semantic information extraction for hazards

Automated spatiotemporal and semantic information extraction for hazards." PhD (Doctor of Philosophy) thesis, ii To everyone who has supported and helped me over the years. iii ACKNOWLEDGMENTS It has been a long journey since I started my Ph.D. study at the University of Iowa. During the six years, I received many supports from my adviser, professors, my family and friends. First of all, I would like to give my sincerest thanks to my adviser Dr. Kathleen Stewart for her enormous contributions of advices and time. She brought me to this research field to start the journey. She gave me a lot of inspirations to explore the new path, and kept me on the right track. I also want to thank my committee members Dr. time and supports. Thanks to all professors and colleagues in the Department of Geographical and Sustainability Science at University of Iowa for their help during my study period. I would like to thank my family and friends for all their supports and love during the 6 years for my Ph.D. study. Last but not least, I want to give my best thanks to my husband. He gives me a lot of supports and encouragement throughout my study. Without his supports, I could not achieve to the destination of the journey. Thank you to all of you. iv ABSTRACT This dissertation explores three research topics related to automated spatiotemporal and semantic information extraction about hazard events from Web news reports and other social media. The dissertation makes a unique contribution of bridging geographic information science, geographic information retrieval, and natural language processing. Geographic information retrieval and natural language processing techniques are applied to extract spatiotemporal and semantic information automatically from Web documents, to retrieve information about patterns of hazard events that are not explicitly described in the texts. Chapters 2, 3 and 4 can be regarded as three standalone journal papers. The research topics covered by the three chapters are related to each other, and are presented in a sequential way. Chapter 2 begins with an investigation of methods for automatically extracting spatial and temporal information about hazards from Web news reports. A set of rules is developed to combine the spatial and temporal information contained in the reports based on how this information is presented in text in order to capture the dynamics of hazard events (e.g., changes in event locations, new events occurring) as they …

[1]  András Kornai MetaCarta at GeoCLEF 2005 , 2005, CLEF.

[2]  Michael Gertz,et al.  TimeTrails: A System for Exploring Spatio-Temporal Information in Documents , 2010, Proc. VLDB Endow..

[3]  A. Stefanidis,et al.  Harvesting ambient geospatial information from social media feeds , 2011, GeoJournal.

[4]  Hanan Samet,et al.  Adaptive context features for toponym resolution in streaming news , 2012, SIGIR '12.

[5]  Walter Daelemans,et al.  A formal framework for evaluation of information extraction , 2004 .

[6]  Wei Wang,et al.  Creating Spatiotemporal Semantic Maps from Web Text Documents , 2015 .

[7]  Arno Scharl,et al.  Annotating and visualizing location data in geospatial web applications , 2008, LocWeb.

[8]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[9]  Ian Densham,et al.  System Demo: A geo-coding service encompassing a geo-parsing tool and integrated digital gazetteer service , 2003, HLT-NAACL 2003.

[10]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[11]  Hanan Samet,et al.  Online Document Clustering Using GPUs , 2013, ADBIS.

[12]  Clodoveu A. Davis,et al.  An ontological gazetteer and its application for place name disambiguation in text , 2011, Journal of the Brazilian Computer Society.

[13]  Michael Gertz,et al.  Extraction and exploration of spatio-temporal information in documents , 2010, GIR.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Hae-Chang Rim,et al.  Identifying interesting Twitter contents using topical analysis , 2014, Expert Syst. Appl..

[16]  N. Wong,et al.  The clustering and transmission dynamics of pandemic influenza A (H1N1) 2009 cases in Hong Kong. , 2011, The Journal of infection.

[17]  Christian Sallaberry,et al.  Toward the Spatial and Temporal Management of Documents: The GéoTopia Platform , 2011, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[18]  Michael J. Paul,et al.  Carmen: A Twitter Geolocation System with Applications to Public Health , 2013 .

[19]  Mark Sanderson,et al.  Geo-tagging for imprecise regions of different sizes , 2007, GIR '07.

[20]  Anthony Stefanidis,et al.  Geosocial gauge: a system prototype for knowledge discovery from social media , 2013, Int. J. Geogr. Inf. Sci..

[21]  Graham Neubig,et al.  Safety Information Mining — What can NLP do in a disaster— , 2011, IJCNLP.

[22]  Daniel S. Weld,et al.  Temporal Information Extraction , 2010, AAAI.

[23]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[24]  S. Dodge,et al.  Taking a Systematic Look at Movement : Developing a Taxonomy of Movement Patterns , 2008 .

[25]  Paolo Rosso,et al.  Inferring Geographical Ontologies from Multiple Resources for Geographical Information Retrieval , 2006, GIR.

[26]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[27]  Krzysztof Janowicz,et al.  The role of ontology in improving gazetteer interaction , 2008, Int. J. Geogr. Inf. Sci..

[28]  Nancy Wiegand,et al.  A Task‐Based Ontology Approach to Automate Geospatial Data Retrieval , 2007, Trans. GIS.

[29]  Gennady L. Andrienko,et al.  Tracing the German centennial flood in the stream of tweets: first lessons learned , 2013, GEOCROWD '13.

[30]  Andreas M. Kaplan,et al.  The early bird catches the news: Nine things you should know about micro-blogging , 2011 .

[31]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[32]  Anuj R. Jaiswal,et al.  Analytics : Applications in Crisis Management , 2011 .

[33]  Mei-Po Kwan,et al.  Interactive geovisualization of activity-travel patterns using three-dimensional geographical information systems: a methodological exploration with a large data set , 2000 .

[34]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[35]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[36]  Harvey J. Miller,et al.  User‐centred time geography for location‐based services , 2004 .

[37]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[38]  Cher Han Lau Detecting news topics from microblogs using sequential pattern mining , 2014 .

[39]  Michael F. Worboys,et al.  From Objects to Events: GEM, the Geospatial Event Model , 2004, GIScience.

[40]  Santo Fortunato,et al.  World citation and collaboration networks: uncovering the role of geography in science , 2012, Scientific Reports.

[41]  Hongbo Yu,et al.  A GIS-based time-geographic approach of studying individual activities and interactions in a hybrid physical–virtual space , 2009 .

[42]  Paul D. Clough Extracting metadata for spatially-aware information retrieval on the internet , 2005, GIR '05.

[43]  Stephan Winter,et al.  Locating place names from place descriptions , 2013, Int. J. Geogr. Inf. Sci..

[44]  Martha Palmer,et al.  Twitter in mass emergency: what NLP techniques can contribute , 2010, HLT-NAACL 2010.

[45]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[46]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[47]  Christopher A. Welty Ontology Research , 2003, AI Mag..

[48]  Barbara Tversky,et al.  Spatial Information Theory A Theoretical Basis for GIS , 1993, Lecture Notes in Computer Science.

[49]  Isabel F. Cruz,et al.  The role of ontologies in data integration , 2005 .

[50]  Naicong Li,et al.  Conceptual Framework for Modeling Dynamic Paths from Natural Language Expressions , 2009 .

[51]  David A Asch,et al.  Decoding twitter: Surveillance and trends for cardiac arrest and resuscitation communication. , 2013, Resuscitation.

[52]  Jeremy Morley,et al.  Creating a Corpus of Geospatial Natural Language , 2013, COSIT.

[53]  Leysia Palen,et al.  Twitter adoption and use in mass convergence and emergency events , 2009 .

[54]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[55]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[56]  Jeremy Witmer,et al.  Extracting and Displaying Temporal and Geospatial Entities from Articles on Historical Events , 2014, Comput. J..

[57]  Isabell M. Welpe,et al.  Election Forecasts with Twitter - How 140 Characters Reflect the Political Landscape , 2011 .

[58]  Xun Shi,et al.  Computing travel time when the exact address is unknown: a comparison of point and polygon ZIP code approximation methods , 2009, International journal of health geographics.

[59]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[60]  Glen Hart,et al.  Geospatial semantics and linked spatiotemporal data - Past, present, and future , 2012, Semantic Web.

[61]  Lei Tan,et al.  Interoperability for Geospatial Analysis: a Semantics and Ontology-based Approach , 2007, ADC.

[62]  Jochen L. Leidner,et al.  Detecting geographical references in the form of place names and associated spatial natural language , 2011, SIGSPACIAL.

[63]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[64]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[65]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[66]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[67]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[68]  Nicola Guarino,et al.  Sweetening Ontologies with DOLCE , 2002, EKAW.

[69]  Yannis Kalfoglou,et al.  Institutionalising ontology-based semantic integration , 2008, Appl. Ontology.

[70]  Thomas Ertl,et al.  Thematic Patterns in Georeferenced Tweets through Space-Time Visual Analytics , 2013, Computing in Science & Engineering.

[71]  Alberto H. F. Laender,et al.  Semantic Expansion of Geographic Web Queries Based on Natural Language Positioning Expressions , 2007, Trans. GIS.

[72]  E. Larson,et al.  Dissemination of health information through social networks: twitter and antibiotics. , 2010, American journal of infection control.

[73]  Peiquan Jin,et al.  Extracting Focused Locations for Web Pages , 2011, WAIM Workshops.

[74]  Ming-Hsiang Tsou,et al.  Visualization of social media: seeing a mirage or a message? , 2013 .

[75]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[76]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[77]  Kalina Bontcheva,et al.  Ontology-Based Information Extraction for Business Intelligence , 2007, ISWC/ASWC.

[78]  James Pustejovsky,et al.  Automatic transformation from TIDES to TimeML annotation , 2011, Lang. Resour. Evaluation.

[79]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.

[80]  Pip Forer,et al.  Movement beyond the snapshot - Dynamic analysis of geospatial lifelines , 2007, Comput. Environ. Urban Syst..

[81]  Nadine Schuurman,et al.  Tweet Me Your Talk: Geographical Learning and Knowledge Production 2.0 , 2013 .

[82]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[83]  M. Goodchild,et al.  Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr , 2013 .

[84]  Davide Buscaldi,et al.  Using the Semantics of Texts for Information Retrieval: A Concept- and Domain Relation-Based Approach , 2013, ADBIS.

[85]  Qunying Huang,et al.  A High Performance Web-Based System for Analyzing and Visualizing Spatiotemporal Data for Climate Studies , 2013, W2GIS.

[86]  Jimmy J. Lin,et al.  Evaluation of NLP Systems , 2010 .

[87]  Patty Kostkova,et al.  Early Warning and Outbreak Detection Using Social Networking Websites: The Potential of Twitter , 2009, eHealth.

[88]  Junchuan Fan,et al.  Thinking about Space-Time Connections: Spatiotemporal Scheduling of Individual Activities , 2013, Trans. GIS.

[89]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[90]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[91]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[92]  Allison Woodruff,et al.  GIPSY: Automated Geographic Indexing of Text Documents , 1994, J. Am. Soc. Inf. Sci..

[93]  Ming-Hsiang Tsou,et al.  Mapping ideas from cyberspace to realspace: visualizing the spatial context of keywords from web page search results , 2014, Int. J. Digit. Earth.

[94]  Russell S. Kirby,et al.  Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control, Research, and Practice , 2008 .

[95]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[96]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[97]  Inderjeet Mani,et al.  SpatialML: annotation scheme, resources, and evaluation , 2010, Lang. Resour. Evaluation.

[98]  Michael F. Goodchild,et al.  Please Scroll down for Article International Journal of Digital Earth Crowdsourcing Geographic Information for Disaster Response: a Research Frontier Crowdsourcing Geographic Information for Disaster Response: a Research Frontier , 2022 .

[99]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[100]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[101]  May Yuan,et al.  Computation and visualization for understanding dynamics in geographic domains - a research agenda , 2007 .

[102]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[103]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[104]  Michael F. Goodchild,et al.  Introduction to digital gazetteer research , 2008, Int. J. Geogr. Inf. Sci..

[105]  Shaowen Wang CyberGIS: blueprint for integrated and scalable geospatial software ecosystems , 2013, Int. J. Geogr. Inf. Sci..

[106]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[107]  Isabel F. Cruz,et al.  Semantic extraction of geographic data from web tables for big data integration , 2013, GIR '13.

[108]  J. Lu,et al.  Developing a domain ontology of Information Science (OIS) , 2012, International Conference on Information Society (i-Society 2012).

[109]  Ken Barker,et al.  Extraction of geospatial information on the Web for GIS applications , 2011, IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC'11).

[110]  Huajun Chen,et al.  Big smog meets web science: smog disaster analysis based on social media and device data on the web , 2014, WWW.

[111]  Wenyi Huang,et al.  GeoTxt: a web API to leverage place references in text , 2013, GIR '13.

[112]  Jintao Li,et al.  A novel method for geographical social event detection in social media , 2013, ICIMCS '13.

[113]  Gerard Rushton et al. Geocoding health data , 2013 .

[114]  Maurice van Keulen,et al.  Improving Toponym Extraction and Disambiguation Using Feedback Loop , 2012, ICWE.

[115]  Brian H. Spitzberg,et al.  Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election , 2013 .

[116]  D. Richardson,et al.  Space-Time Integration in Geography and GIScience: Research Frontiers in the US and China , 2015 .

[117]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[118]  Ryan Shaw,et al.  Mapping life events: temporal and geographic context for biographical information , 2009, JCDL '09.

[119]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[120]  Max J. Egenhofer,et al.  Toward the semantic geospatial web , 2002, GIS '02.

[121]  Tuomo Kakkonen,et al.  Ontology-Based Information and Event Extraction for Business Intelligence , 2012, AIMSA.

[122]  Hsin-Chang Yang,et al.  Exploiting online social data in ontology learning for event tracking and emergency response , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[123]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[124]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[125]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[126]  J. A. Glennona Crowdsourcing geographic information for disaster response: a research , 2010 .

[127]  Alexander M. Fraser,et al.  Semi-Supervised Training for Statistical Word Alignment , 2006, ACL.

[128]  Hanan Samet,et al.  STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents , 2007, DG.O.

[129]  Hongbo Yu,et al.  Spatio-temporal GIS Design for Exploring Interactions of Human Activities , 2006 .

[130]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[131]  David M. Mark,et al.  Features, Objects, and Other Things: Ontological Distinctions in the Geographic Domain , 2001, COSIT.

[132]  Daniel R. Montello,et al.  Spatial Information Theory A Theoretical Basis for GIS , 1995, Lecture Notes in Computer Science.