Toponym Resolution in Text: “Which Sheffield is it?”

Named entity tagging comprises the sub-tasks of identifying a text span and classifying it, but this view ignores the relationship between the entities and the world. Spatial and temporal entities ground events in space-time, and this relationship is vital for applications such as question answering and event tracking. There is much recent work regarding the temporal dimension [13, 10], but no extensive study of the spatial dimension. I propose to investigate how spatial named entities (which are often referentially ambiguous) can be automatically resolved with respect to an extensional coordinate model (toponym resolution), using hybrid heuristic/statistical methods. The major contributions of this research project are a corpus of text manually annotated for spatial named entities with their model correlates as a training/evaluation resource [4] and a novel method to spatially ground toponyms in text.

[1]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[2]  Helena Ahonen-Myka,et al.  Topic Detection and Tracking with Spatio-Temporal Evidence , 2003, ECIR.

[3]  Rohini K. Srihari,et al.  A Hybrid Approach for Named Entity and Sub-Type Tagging , 2000, ANLP.

[4]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[5]  Allison Woodruff,et al.  The Sequoia 2000 Electronic Repository , 1995, Digit. Tech. J..

[6]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[7]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[8]  Cheng Niu,et al.  Location Normalization for Information Extraction , 2002, COLING.

[9]  Allison Woodruff,et al.  GIPSY: Automated Geographic Indexing of Text Documents , 1994, J. Am. Soc. Inf. Sci..

[10]  Jochen L. Leidner Towards a Reference Corpus for Automatic Toponym Resolution Evaluation , 2004 .

[11]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[12]  Mark Steedman,et al.  Wide-Coverage Semantic Representations from a CCG Parser , 2004, COLING.

[13]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[14]  Cheng Niu,et al.  InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[15]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.