Extracting Geographic Context from the Web: GeoReferencing in MyMoSe

Many Web pages are clearly related to specific locations. Identifying this geographic focus is the cornerstone of the next generation of geographic context aware search services. This paper shows a multistage method for assigning a geographic focus to Web pages (GeoReferencing), using several heuristics for toponym disambiguation and a scoring function for focus determination. We provide an experimental methodology for evaluating the accuracy of the system with Web pages in English and Spanish. Finally, we have obtained promising results, reaching an accuracy of over 70% with a town-level resolution.

[1]  Yasusi Kanada A method of geographical name extraction from Japanese text for thematic geographical search , 1999, CIKM '99.

[2]  Christopher B. Jones,et al.  Workshop on geographic information retrieval, SIGIR 2004 , 2004, SIGF.

[3]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[4]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[5]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6]  Marty Himmelstein Local Search: The Internet Is the Yellow Pages , 2005, Computer.

[7]  Allison Woodruff,et al.  GIPSY: automated geographic indexing of text documents , 1994 .

[8]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[9]  Mário J. Silva,et al.  A graph-ranking algorithm for geo-referencing documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Mário J. Silva,et al.  Adding geographic scopes to web resources , 2006, Comput. Environ. Urban Syst..

[11]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[12]  David M. Mountain,et al.  Geographic information retrieval in a mobile environment: evaluating the needs of mobile individuals , 2007, J. Inf. Sci..

[13]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[14]  Cheng Niu,et al.  Location Normalization for Information Extraction , 2002, COLING.

[15]  Bernhard Seeger,et al.  Design and Implementation of a Geographic Search Engine , 2005, WebDB.