Recognition of Implicit Geographic Movement in Text

Analyzing the geographic movement of humans, animals, and other phenomena is a growing field of research. This research has benefited urban planning, logistics, animal migration understanding, and much more. Typically, the movement is captured as precise geographic coordinates and time stamps with Global Positioning Systems (GPS). Although some research uses computational techniques to take advantage of implicit movement in descriptions of route directions, hiking paths, and historical exploration routes, innovation would accelerate with a large and diverse corpus. We created a corpus of sentences labeled as describing geographic movement or not and including the type of entity moving. Creating this corpus proved difficult without any comparable corpora to start with, high human labeling costs, and since movement can at times be interpreted differently. To overcome these challenges, we developed an iterative process employing hand labeling, crowd voting for confirmation, and machine learning to predict more labels. By merging advances in word embeddings with traditional machine learning models and model ensembling, prediction accuracy is at an acceptable level to produce a large silver-standard corpus despite the small gold-standard corpus training set. Our corpus will likely benefit computational processing of geography in text and spatial cognition, in addition to detection of movement.

[1]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[2]  Rui Li,et al.  The Endpoint Hypothesis: A Topological-Cognitive Assessment of Geographic Scale Movement Patterns , 2009, COSIT.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Sébastien Mustière,et al.  Automatic Itinerary Reconstruction from Texts , 2014, GIScience.

[5]  Dieter Pfoser,et al.  Geospatial route extraction from texts , 2010, DMG '10.

[6]  João Gama,et al.  A predictive model for the passenger demand on a taxi network , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[7]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[8]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[9]  Robert Weibel,et al.  Analysis of movement data , 2016, Int. J. Geogr. Inf. Sci..

[10]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Yu Zheng,et al.  Traffic prediction in a bike-sharing system , 2015, SIGSPATIAL/GIS.

[13]  Bartosz Broda,et al.  Inforex - a web-based tool for text corpus management and semantic annotation , 2012, LREC.

[14]  Lise Getoor,et al.  Reducing Label Cost by Combining Feature Labels and Crowdsourcing , 2011 .

[15]  Robert Weibel,et al.  Movement similarity assessment using symbolic representation of trajectories , 2012, Int. J. Geogr. Inf. Sci..

[16]  Paul U. Lee,et al.  Wayfinding choremes - a language for modeling conceptual route knowledge , 2005, J. Vis. Lang. Comput..

[17]  Stan Matwin,et al.  ANALYTiC: An Active Learning System for Trajectory Classification , 2017, IEEE Computer Graphics and Applications.

[18]  Wenyi Huang,et al.  GeoTxt: a web API to leverage place references in text , 2013, GIR '13.

[19]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Jeremy Morley,et al.  Creating a Corpus of Geospatial Natural Language , 2013, COSIT.

[21]  Qunying Huang,et al.  Mining online footprints to predict user’s next location , 2017, Int. J. Geogr. Inf. Sci..

[22]  Xiangming Xiao,et al.  Mapping migratory flyways in Asia using dynamic Brownian bridge movement models , 2015, Movement ecology.

[23]  Nate Blaylock Semantic Annotation of Street-Level Geospatial Entities , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[24]  Michael J. Wisdom,et al.  Starkey Project : history facilities, and data collection methods for ungulate research , 1997 .

[25]  Somayeh Dodge,et al.  From Observation to Prediction: The Trajectory of Movement Research in GIScience. , 2016 .

[26]  Xing Xie,et al.  GeoLife: Managing and Understanding Your Past Life over Maps , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[27]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Elizaveta Maslennikova ELMo Word Representations For News Protection , 2019, CLEF.

[30]  Xiao Zhang,et al.  GeoCAM: A geovisual analytics workspace to contextualize and interpret statements about movement , 2011, J. Spatial Inf. Sci..

[31]  Alan M. MacEachren,et al.  GeoCorpora: building a corpus to test and train microblog geoparsers , 2018, Int. J. Geogr. Inf. Sci..

[32]  Kai-Florian Richter,et al.  A Model for Context-Specific Route Directions , 2004, Spatial Cognition.

[33]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[34]  S. Dodge Context-sensitive spatiotemporal simulation model for movement , 2016 .

[35]  Stephan Winter,et al.  Structural Salience of Landmarks for Route Directions , 2005, COSIT.

[36]  Dennis Normile,et al.  Are Wild Birds to Blame? , 2005, Science.

[37]  Krzysztof Janowicz,et al.  Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling , 2016, EKAW.

[38]  Javier Nogueras-Iso,et al.  Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus , 2014, SIGSPATIAL/GIS.

[39]  Johan Boye,et al.  SpaceRef: A corpus of street-level geographic descriptions , 2016, LREC.

[40]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[41]  K. Winker,et al.  Intercontinental Spread of Asian-Origin H5N8 to North America through Beringia by Migratory Birds , 2015, Journal of Virology.

[42]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[43]  Gaurav Singh,et al.  Spatiotemporal Information Extraction from a Historic Expedition Gazetteer , 2016, ISPRS Int. J. Geo Inf..

[44]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.