Toponym Disambiguation Using Ontology-Based Semantic Similarity

We propose a new heuristic for toponym sense disambiguation, to be used when mapping toponyms in text to ontology concepts, using techniques based on semantic similarity measures. We evaluated the proposed approach using a collection of Portuguese news articles from which the geographic entity names were extracted and then manually mapped to concepts in a geospatial ontology covering the territory of Portugal. The results suggest that using semantic similarity to disambiguate toponyms in text produces good results, in comparison with a baseline method.

[1]  Mário J. Silva,et al.  Relevance Ranking for Geographic IR , 2006, GIR.

[2]  Mário J. Silva,et al.  A statistical study of the WPT05 crawl of the Portuguese Web , 2010 .

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Diana Santos,et al.  The Key to the First CLEF with Portuguese: Topics, Questions and Answers in CHAVE , 2004, CLEF.

[5]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[6]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[7]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[8]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[9]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[10]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[11]  Nuno Cardoso REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto , 2009 .

[12]  Mário J. Silva,et al.  Geographic Ontologies Production in GREASE-II , 2009 .

[13]  Bruno Martins,et al.  A Machine Learning Approach for Resolving Place References in Text , 2010, AGILE Conf..

[14]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.