A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia

We target in this paper the challenge of extracting geospatial data from the article text of the English Wikipedia. We present the results of a Hidden Markov Model (HMM) based approach to identify location-related named entities in the our corpus of Wikipedia articles, which are primarily about battles and wars due to their high geospatial content. The HMM NER process drives a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name (often referred to as grounding). We compare our results to a previously developed data structure and algorithm for disambiguating place names that can have multiple coordinates. We demonstrate an overall f-measure of 79.63\% identifying and geocoding place names. Finally, we compare the results of the HMM-driven process to earlier work using a Support Vector Machine.

[1]  Øyvind Vestavik Geographic Information Retrieval : An Overview , 2004 .

[2]  Jeremy Witmer,et al.  Extracting Geospatial Entities from Wikipedia , 2009, 2009 IEEE International Conference on Semantic Computing.

[3]  José Luis Borbinha,et al.  Extracting and Exploring the Geo-Temporal Semantics of Textual Resources , 2008, 2008 IEEE International Conference on Semantic Computing.

[4]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[5]  Lise Getoor,et al.  Entity resolution in geospatial data integration , 2006, GIS '06.

[6]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[7]  Wisam Dakka,et al.  Augmenting Wikipedia with Named Entity Tags , 2008, IJCNLP.

[8]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[9]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[10]  Jeremy Witmer,et al.  Mining Wikipedia Article Clusters for Geospatial Entities and Relationships , 2009, AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0.

[11]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[12]  T D'Roza,et al.  An Overview of Location-Based Services , 2003 .

[13]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[14]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.