Geographic co-occurrence as a tool for gir.

In this paper we describe the development of a geographic co-occurrence model and how it can be applied to geographic information retrieval. The model consists of mining co-occurrences of placenames from Wikipedia, and then mapping these placenames to locations in the Getty Thesaurus of Geographical Names. We begin by quantifying the accuracy of our model and compute theoretical bounds for the accuracy achievable when applied to placename disambiguation in free text. We conclude with a discussion of the improvement such a model could provide for placename disambiguation and geographic relevance ranking over traditional methods.

[1]  Eugene Agichtein,et al.  Predicting accuracy of extracting information from unstructured text collections , 2005, CIKM '05.

[2]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[3]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[4]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[5]  David M. Mark,et al.  Naive Geography , 1995, COSIT.

[6]  Stefan M. Rüger,et al.  Identifying and grounding descriptions of places , 2006, GIR.

[7]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[8]  Cheng Niu,et al.  InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[9]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10]  Gregory R. Crane,et al.  Quantifying the accuracy of relational statements in Wikipedia: a methodology , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[11]  Paolo Rosso,et al.  Inferring Geographical Ontologies from Multiple Resources for Geographical Information Retrieval , 2006, GIR.

[12]  Linda L. Hill,et al.  Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints , 2000, ECDL.

[13]  Christoph Schlieder,et al.  Qualitative Spatial Representation for Information Retrieval by Gazetteers , 2001, COSIT.

[14]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[15]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[18]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[19]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[20]  Sven Hartrumpf,et al.  University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries , 2005, CLEF.

[21]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[22]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[23]  Nuno Cardoso,et al.  The University of Lisbon at GeoCLEF 2007 , 2007, CLEF.

[24]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[25]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[26]  Max J. Egenhofer,et al.  Metric details for natural-language spatial relations , 1998, TOIS.