论文信息 - Toponym Resolution in Text: “Which Sheffield is it?”

Toponym Resolution in Text: “Which Sheffield is it?”

Named entity tagging comprises the sub-tasks of identifying a text span and classifying it, but this view ignores the relationship between the entities and the world. Spatial and temporal entities ground events in space-time, and this relationship is vital for applications such as question answering and event tracking. There is much recent work regarding the temporal dimension [13, 10], but no extensive study of the spatial dimension. I propose to investigate how spatial named entities (which are often referentially ambiguous) can be automatically resolved with respect to an extensional coordinate model (toponym resolution), using hybrid heuristic/statistical methods. The major contributions of this research project are a corpus of text manually annotated for spatial named entities with their model correlates as a training/evaluation resource [4] and a novel method to spatially ground toponyms in text.

Jochen L. Leidner

[1] David Yarowsky,et al. One Sense Per Discourse , 1992, HLT.

[2] Helena Ahonen-Myka,et al. Topic Detection and Tracking with Spatio-Temporal Evidence , 2003, ECIR.

[3] Rohini K. Srihari,et al. A Hybrid Approach for Named Entity and Sub-Type Tagging , 2000, ANLP.

[4] Grace Hui Yang,et al. Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[5] Allison Woodruff,et al. The Sequoia 2000 Electronic Repository , 1995, Digit. Tech. J..

[6] Jochen L. Leidner,et al. Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[7] Gideon S. Mann,et al. Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[8] Cheng Niu,et al. Location Normalization for Information Extraction , 2002, COLING.

[9] Allison Woodruff,et al. GIPSY: Automated Geographic Indexing of Text Documents , 1994, J. Am. Soc. Inf. Sci..

[10] Jochen L. Leidner. Towards a Reference Corpus for Automatic Toponym Resolution Evaluation , 2004 .

[11] Inderjeet Mani,et al. Robust Temporal Processing of News , 2000, ACL.

[12] Mark Steedman,et al. Wide-Coverage Semantic Representations from a CCG Parser , 2004, COLING.

[13] Erik Rauch,et al. A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[14] Cheng Niu,et al. InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[15] Gregory R. Crane,et al. Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.