Robust location search from text queries

Robust, global, address geocoding is challenging because there is no single address format that applies to all geographies, and in any case, users may not restrict themselves to well-formed addresses. Particularly in online mapping systems, users frequently enter queries with missing or conflicting information, misspellings, address transpositions, and other such variations. We present a novel system which handles these difficulties by using a combination of textual similarity and spatial coherence to guide a depth-first search over the large space of possible interpretations of a text query. The system robustly matches text subsequences of a query with text attributes (i.e., any text labels associated with the entity) in a spatial-entity database. Each matched attribute is associated with the pre-computed spatial union of all entities that have that attribute. Candidate results are formed by incremental spatial intersections of these unions. Experimental results demonstrate that our system is capable of supporting regions with widely differing address formats, without region-specific customization or training. Furthermore, we show that our system significantly outperforms commercial systems for unstructured location queries and queries containing errors.

[1]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[2]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[3]  Jerry H. Ratcliffe,et al.  On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units , 2001, Int. J. Geogr. Inf. Sci..

[4]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[5]  Craig A. Knoblock,et al.  Exploiting online sources to accurately geocode addresses , 2004, GIS '04.

[6]  Jochen L. Leidner Toponym Resolution in Text: “Which Sheffield is it?” , 2004 .

[7]  Jochen L. Leidner Toponym resolution in text (abstract only): "which sheffield is it?" , 2004, SIGIR '04.

[8]  Peter Christen,et al.  A Probabilistic Geocoding System based on a National Address File , 2004 .

[9]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[10]  Irene Gargantini,et al.  An effective way to represent quadtrees , 1982, CACM.

[11]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[12]  Graham Rhind Global Sourcebook of Address Data Management: A Guide to Address Formats and Data in 194 Countries , 1999 .

[13]  Paul A. Viola,et al.  Learning to extract information from semi-structured text using a discriminative context free grammar , 2005, SIGIR '05.

[14]  Bruno Pouliquen,et al.  Geographical information recognition and visualization in texts written in various languages , 2004, SAC '04.

[15]  Marco Kimler Geo-Coding: Recognition of geographical references in unstructured text, and their visualisation , 2004 .

[16]  Jochen L. Leidner Toponym resolution in text , 2007 .

[17]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[18]  Craig A. Knoblock,et al.  From Text to Geographic Coordinates: The Current State of Geocoding , 2007 .

[19]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.