Geo-referencing with semi-automatic gazetteer expansion using lexico-syntactical patterns and co-reference analysis

Geo-referencing is a key task for geographical information retrieval because it allows unstructured or textual documents (i.e., Web pages) to be associated with geographical locations, which are then used by geo-search engines to index documents and search information by spatial criteria. This work proposes a strategy to extract geo-references from textual documents that combine natural language-processing techniques and co-reference solving heuristics, which in turn can be used to expand a geographical gazetteer. Implicit geographical entities (i.e., those entities referred to by pronouns) are recognized and incorporated into the gazetteer that is updated and used for geo-referencing tasks. Experiments show the promise of the approach to geo-referencing Web pages when dealing with implicit and/or indirect geo-references.

[1]  Xing Xie,et al.  Detecting Geographical Serving Area of Web Resources , 2006, GIR.

[2]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[3]  Jiajie Xu,et al.  Calculation of Target Locations for Web Resources , 2006, WISE.

[4]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[5]  Xing Xie,et al.  Detecting geographic locations from web resources , 2005, GIR '05.

[6]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[7]  Pablo Fernández,et al.  Google’s pagerank and beyond: The science of search engine rankings , 2008 .

[8]  Yael Ziv,et al.  Centering, Global Focus, and Right Dislocation , 2022 .

[9]  James Frew,et al.  Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library , 1999, D Lib Mag..

[10]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[11]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[12]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[13]  Carolyn R. Watters,et al.  Geosearcher: Location-based Ranking of Search Engine Results , 2003, J. Assoc. Inf. Sci. Technol..

[14]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[15]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[16]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[17]  Adrian Popescu,et al.  Gazetiki: automatic creation of a geographical gazetteer , 2008, JCDL '08.

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[20]  Scott Gaffney,et al.  Learning a Named Entity Tagger from Gazetteers with the Partial Perceptron , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[21]  Stefan M. Rüger,et al.  Geographic co-occurrence as a tool for gir. , 2007, GIR '07.

[22]  Mário J. Silva,et al.  A graph-ranking algorithm for geo-referencing documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[23]  Houda Bouamor,et al.  Mining a Multilingual Geographical Gazetteer from the Web , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[24]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[25]  Violeta Seretan,et al.  Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop , 2006 .

[26]  Zornitsa Kozareva Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists , 2006, EACL.

[27]  Mário J. Silva,et al.  Assigning Geographical Scopes To Web Pages , 2005, ECIR.

[28]  Hwee Tou Ng,et al.  Corpus-Based Learning for Noun Phrase Coreference Resolution , 1999, EMNLP.

[29]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[30]  Avi Arampatzis,et al.  Web-based delineation of imprecise regions , 2006, Comput. Environ. Urban Syst..

[31]  Krzysztof Janowicz,et al.  The role of ontology in improving gazetteer interaction , 2008, Int. J. Geogr. Inf. Sci..

[32]  Amittai Axelrod,et al.  On building a high performance gazetteer database , 2003, HLT-NAACL 2003.