Toponym Disambiguation by Arborescent Relationships

Problem statement: The way of referring to a place in the geographical space can be formal, based on the spatial coordinates, or informal, which we use in natural language by using toponyms (place names). A toponym can represent several geographical places. This ambiguity made problematic its conversion towards a unique formal representation. Toponym disambiguation in text is the task of assigning a unique location to an ambiguous place name in a given textual context. Approach: Several toponym disambiguation heuristics assumed a geographical proximity between the toponyms of the same context. This proximity can be in terms of spatial distance or in terms of arborsecent relationships, i.e., proximity in the hierarchical tree of the world places. This study presented a new toponym disambiguation heuristic in text based on the quantification of the arborescent proximity between toponyms. This quantification was done by a new measure of geographical correlation that we call the Geographical Density. Results: Our method was compared to the state of the art methods using GeoSemCor corpus and it has outperformed them in term of recall (87.4%) and coverage (99.0%). The results showed that the toponyms of the same context are much closer in terms of arborescent relationships than in terms of spatial relationships. Conclusion: We believe that the quantification of arborescent relationships between toponyms of the same textual context is a good way to improve the recall of TD task. However, all the arborescent relationships’ types must be considered and not only the meronymy, which is the relation the most exploited in the existing TD methods.

[1]  Paolo Rosso,et al.  Automatic Noun Sense Disambiguation , 2003, CICLing.

[2]  Yi Li,et al.  An empirical study of the effects of NLP components on Geographic IR performance , 2008, Int. J. Geogr. Inf. Sci..

[3]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[4]  José Luis Vicedo González,et al.  Georeferencing: The geographic associations of information , 2007, J. Assoc. Inf. Sci. Technol..

[5]  Yasuhiko Morimoto,et al.  Extracting spatial knowledge from the web , 2003, 2003 Symposium on Applications and the Internet, 2003. Proceedings..

[6]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[7]  Paolo Rosso,et al.  A conceptual density‐based approach for the disambiguation of toponyms , 2008, Int. J. Geogr. Inf. Sci..

[8]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[9]  Jochen L. Leidner An evaluation dataset for the toponym resolution task , 2006, Comput. Environ. Urban Syst..

[10]  Stefan M. Rüger,et al.  Geographic co-occurrence as a tool for gir. , 2007, GIR '07.

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Claudia Bauzer Medeiros,et al.  The Web as a Data Source for Spatial Databases , 2003, GeoInfo.

[13]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[14]  Zsófia Osváth,et al.  DOI: 10 , 2011 .

[15]  Bruno Pouliquen,et al.  Geographical information recognition and visualization in texts written in various languages , 2004, SAC '04.

[16]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[17]  Paul D. Clough Extracting metadata for spatially-aware information retrieval on the internet , 2005, GIR '05.

[18]  Cheng Niu,et al.  InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[19]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[20]  Paolo Rosso,et al.  Map-based vs. knowledge-based toponym disambiguation , 2008, GIR '08.

[21]  Jochen L. Leidner Towards a Reference Corpus for Automatic Toponym Resolution Evaluation , 2004 .

[22]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[23]  Yi Li,et al.  Exploring Probabilistic Toponym Resolution for Geographical Information Retrieval , 2006, GIR.