Development and evaluation of a geographic information retrieval system using fine grained toponyms

Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spa- tial component linked by a spatial relationship. However, evaluation initiatives have of- ten failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial in- dexes) at retrieving documents from a corpus describing mountaineering expeditions, cen- tred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collec- tion of queries and judgments. The test collection allowed us to demonstrate that a GIR- based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR.

[1]  Michela Bertolotto,et al.  Computing the semantic similarity of geographic terms using volunteered lexical definitions , 2013, Int. J. Geogr. Inf. Sci..

[2]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[3]  M. Kennedy Georeferencing: The Geographic Associations of Information , 2008 .

[4]  Fredric C. Gey,et al.  GeoCLEF 2008: The CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[5]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[6]  José Luis Borbinha,et al.  Geographically-aware information retrieval for collections of digitized historical maps , 2007, GIR '07.

[7]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[8]  Nuno Cardoso Evaluating Geographic Information Retrieval , 2011, SIGSPACIAL.

[9]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[10]  Mor Naaman,et al.  Methods for extracting place semantics from Flickr tags , 2009, TWEB.

[11]  Ryen W. White,et al.  Characterizing local interests and local knowledge , 2012, CHI.

[12]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[13]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[14]  Max J. Egenhofer,et al.  Spatial‐Scene Similarity Queries , 2008, Trans. GIS.

[15]  Damien Palacio,et al.  Creating test collections from user generated content for GIR evaluation , 2013, GIR '13.

[16]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[17]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[18]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[19]  Ray R. Larson,et al.  A comparison of geometric approaches to assessing spatial similarity for GIR , 2008, Int. J. Geogr. Inf. Sci..

[20]  Laurianne Sitbon,et al.  Evaluating medical information retrieval , 2011, SIGIR.

[21]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, Conference and Labs of the Evaluation Forum.

[22]  Julien Lesbegueries,et al.  A global process to access documents' contents from a geographical point of view , 2008, J. Vis. Lang. Comput..

[23]  Martin Tomko,et al.  User evaluation of automatically generated keywords and toponyms for geo-referenced images , 2013, J. Assoc. Inf. Sci. Technol..

[24]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[25]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[26]  Ross Purves,et al.  Exploring place through user-generated content: Using Flickr tags to describe city cores , 2010, J. Spatial Inf. Sci..

[27]  Clare Davies,et al.  User Needs and Implications for Modelling Vague Named Places , 2009, Spatial Cogn. Comput..

[28]  Yi Li,et al.  Exploring Probabilistic Toponym Resolution for Geographical Information Retrieval , 2006, GIR.

[29]  Wei Vivian Zhang,et al.  Geographic intention and modification in web search , 2008, Int. J. Geogr. Inf. Sci..

[30]  Carol Peters,et al.  Cross-Language Evaluation Forum: Objectives, Results, Achievements , 2004, Information Retrieval.

[31]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[32]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[33]  Steven Schockaert,et al.  Georeferencing Wikipedia Documents Using Data from Social Media Sources , 2014, ACM Trans. Inf. Syst..

[34]  Pável Calado,et al.  Learning to rank for geographic information retrieval , 2010, GIR.

[35]  Ray R. Larson,et al.  Spatial Ranking Methods for Geographic Information Retrieval (GIR) in Digital Libraries , 2004, ECDL.

[36]  Peter G. B. Enser,et al.  Analysis of user need in image archives , 1997, J. Inf. Sci..

[37]  Johannes Leveling,et al.  Experiments on the Exclusion of Metonymic Location Names from GIR , 2006, CLEF.

[38]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[39]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[40]  David M. Mark,et al.  Naive Geography , 1995, COSIT.

[41]  Torsten Suel,et al.  Analysis of geographic queries in a search engine log , 2008, LocWeb.

[42]  Jaeyoung Choi,et al.  The Placing Task: A Large-Scale Geo-Estimation Challenge for Social-Media Videos and Images , 2014, GeoMM '14.

[43]  Stephen C. Levinson,et al.  Language and landscape: a cross-linguistic perspective , 2008 .

[44]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[45]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[46]  Hideo Joho,et al.  Judging the Spatial Relevance of Documents for GIR , 2006, ECIR.

[47]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[48]  Jie Yin,et al.  Pinpointing Locational Focus in Microblogs , 2014, ADCS.

[49]  Ross Purves,et al.  Spatial autocorrelation and toponym ambiguity , 2008, GIR '08.

[50]  Hanan Samet,et al.  STEWARD: architecture of a spatio-textual search engine , 2007, GIS.

[51]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[52]  Donna K. Harman,et al.  The TREC Test Collections , 2005 .

[53]  Ross Purves,et al.  From text to landscape: locating, identifying and mapping the use of landscape features in a Swiss Alpine corpus , 2014, Int. J. Geogr. Inf. Sci..

[54]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[55]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[56]  Erik Frekjmr,et al.  Measuring Usability: Are Effectiveness, Efficiency, and Satisfaction Really Correlated? , 2000 .

[57]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[58]  Yi Li,et al.  An empirical study of the effects of NLP components on Geographic IR performance , 2008, Int. J. Geogr. Inf. Sci..

[59]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[60]  Jochen Schaab,et al.  Automated Footprint Generation from Geotags with Kernel Density Estimation and Support Vector Machines , 2009, Spatial Cogn. Comput..

[61]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[62]  Damien Palacio,et al.  On the evaluation of Geographic Information Retrieval systems , 2010, International Journal on Digital Libraries.

[63]  Ross S. Purves,et al.  Resolving fine granularity toponyms: Evaluation of a disambiguation approach , 2012, GIScience 2012.