论文信息 - Toponym resolution in text: annotation, evaluation and applications of spatial grounding

Toponym resolution in text: annotation, evaluation and applications of spatial grounding

In Information Extraction (IE), processing of named entities in text has traditionally been seen as a two-step process comprising a flat text span recognition sub-task and an atomic classification sub-task; relating the text span to a model of the world has been ignored by evaluations such as DARPA/NIST's MUC or ACE. However, spatial and temporal expressions refer to events in space-time, and the grounding of events is a precondition for accurate reasoning. Thus, automatic grounding can improve many applications such as automatic map drawing (e.g. for choosing a focus) and question answering (e.g., for questions like How far is London from Edinburgh, given a story in which both occur and can be resolved). Whereas temporal grounding has received considerable attention in the recent Past [2, 3], robust spatial grounding has long been neglected. Concentrating on geographic names for populated places, I define the task of automatic Toponym Resolution (TR) as computing the mapping from occurrences of names for places as found in a text to a representation of the extensional semantics of the location referred to (its referent), such as a geographic latitude/longitude footprint. The task of mapping from names to locations is hard due to insufficient and noisy databases, and a large degree of ambiguity: common words need to be distinguished from proper names (geo/non-geo ambiguity), and the mapping between names and locations is ambiguous London can refer to the capital of the UK or to London, Ontario, Canada, or to about forty other Londons on earth). In addition, names of places and the boundaries referred to change over time, and databases are incomplete.

Jochen L. Leidner

[1] Y. Tuan,et al. Space and Place: The Perspective of Experience. , 1978 .

[2] Hanan Samet,et al. The Design and Analysis of Spatial Data Structures , 1989 .

[3] Herbert A. Simon,et al. Why a Diagram is (Sometimes) Worth Ten Thousand Words , 1987, Cogn. Sci..

[4] Soteria Svorou,et al. The grammar of space , 1994 .

[5] Nancy A. Chinchor,et al. Overview of MUC-7 , 1998, MUC.

[6] Stefan Evert,et al. The NITE XML Toolkit: Flexible annotation for multimodal language data , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[7] Linda L. Hill,et al. Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints , 2000, ECDL.

[8] Walter L. Smith. Probability and Statistics , 1959, Nature.

[9] Linda L. Hill. Access to geographic concepts in online bibliographic files: effectiveness of current practices and the potential of a graphic interface , 1990 .

[10] Mark Sanderson,et al. Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[11] Adam Kilgarriff,et al. Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[12] Siobhan Chapman. Logic and Conversation , 2005 .

[13] Roy T. Fielding,et al. Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[14] Olga Uryupina,et al. Semi-supervised learning of geographical gazetteer from the internet , 2003, Workshop On Analysis Of Geographic References.

[15] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[16] Steffen Staab,et al. Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[17] Gideon S. Mann,et al. Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[18] Yoko NISHIMURA,et al. Google Earth , 2008, Encyclopedia of GIS.

[19] Anthony G. Cohn,et al. Qualitative Spatial Representation and Reasoning with the Region Connection Calculus , 1997, GeoInformatica.

[20] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[21] Rohini K. Srihari,et al. A Hybrid Approach for Named Entity and Sub-Type Tagging , 2000, ANLP.

[22] Markus Neteler,et al. Open Source GIS: A GRASS GIS Approach , 2007 .

[23] Zhong-ren Peng,et al. Internet GIS: Distributed Geographic Information Services for the Internet and Wireless Networks , 2003 .

[24] Patrice Enjalbert,et al. Geographic reference analysis for geographic document querying , 2003, HLT-NAACL 2003.

[25] Marilyn Eileen Jessen. A semantic study of spatial and temporal expressions in English , 1974 .

[26] Marc Moens,et al. Named Entity Recognition without Gazetteers , 1999, EACL.

[27] Ivar Jacobson,et al. The Unified Software Development Process , 1999 .

[28] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[29] Nancy Chinchor,et al. Overview of MUC-7 , 1998, MUC.

[30] Schuyler Erle,et al. Mapping hacks : tips & tools for electronic cartography , 2005 .

[31] Alexander G. Hauptmann,et al. USING LOCATION INFORMATION FROM SPEECH RECOGNITION OF TELEVISION NEWS BROADCASTS , 1999 .

[32] Ralph Grishman,et al. NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[33] J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[34] Ellen M. Voorhees,et al. Overview of TREC 2004 , 2004, TREC.

[35] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[36] Ellen M. Voorhees,et al. The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[37] Howard D. Wactlar,et al. Complementary video and audio analysis for broadcast news archives , 2000, CACM.

[38] Erik Rauch,et al. A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[39] Cheng Niu,et al. InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[40] Jochen L. Leidner. Current Issues in Software Engineering for Natural Language Processing , 2003, HLT-NAACL 2003.

[41] Jochen L. Leidner. Toponym Resolution in Text: “Which Sheffield is it?” , 2004 .

[42] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[43] James Pustejovsky,et al. Annotation of Temporal and Event Expressions , 2003, HLT-NAACL.

[44] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[45] Gregory R. Crane,et al. Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[46] Naomi Sager,et al. Natural Language Information Processing: A Computer Grammar of English and Its Applications , 1980 .

[47] Allison Woodruff,et al. The Sequoia 2000 Electronic Repository , 1995, Digit. Tech. J..

[48] Breck Baldwin,et al. Cross-Document Event Coreference: Annotations, Experiments, and Observations , 1999, COREF@ACL.

[49] Sharon Oviatt,et al. Multimodal interactive maps: designing for human performance , 1997 .

[50] Dan Wu,et al. On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[51] Douglas E. Appelt,et al. Deductive Question Answering from Multiple Resources , 2004, New Directions in Question Answering.

[52] Andy Shaw. AlertNet Webmap Initiative - New Media Approaches to Mapping Humanitarian Response , 2003 .

[53] Andrew Tomkins,et al. How to build a WebFountain: An architecture for very large-scale text analytics , 2004, IBM Syst. J..

[54] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[55] David Yarowsky,et al. Desparately Seeking Cebuano , 2003, NAACL.

[56] Jochen L. Leidner. A wireless natural language search engine , 2005, SIGIR '05.

[57] Anthony G. Cohn,et al. A Spatial Logic based on Regions and Connection , 1992, KR.

[58] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[59] Douglas E. Appelt,et al. FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.

[60] Thomas D. Sandry,et al. Introductory Statistics With R , 2003, Technometrics.

[61] Malvina Nissim,et al. Towards a Corpus Annotated for Metonymies: the Case of Location Names , 2002, LREC.

[62] Stephen Potter,et al. A Framework for Text Mining Services , 2004 .