Resolving geo-identities of addresses in emerging economies where users rely primarily on short messaging as the means of querying, poses several daunting challenges: lack of proper addressing schemes, non-availability of cartographic information and non-standardized nomenclature of geo-spatial entities such as streets and avenues, to name a few. In this work, we propose a simple and elegant approach to solve this problem for emerging economies. By treating address texts as short documents and exploiting latent proximity information contained in them --- for example, landmark like references, similarity of address texts etc --- we transform the problem of resolving geo-identity to a search problem on short annotated geo-spatial documents, collected through extensive survey of six cities in India. Our solution spans all the phases of building a geo-identity resolution system, even though our emphasis is on the collection and organization of the corpus to facilitate a search engine backend for the task. Through experimentation based on a representative test set collected from the real world, we demonstrate how this approach achieves over 94% accuracy in resolution and an order of magnitude reduction in system state (memory) with nearly zero false-negatives - a significant improvement over the state of the art in emerging markets.
[1]
Torsten Suel,et al.
Analysis of geographic queries in a search engine log
,
2008,
LocWeb.
[2]
Fernando Diaz,et al.
A case study of using geographic cues to predict query news intent
,
2009,
GIS.
[3]
Mehran Sahami,et al.
A web-based kernel function for measuring the similarity of short text snippets
,
2006,
WWW '06.
[4]
Hans-Peter Kriegel,et al.
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
,
1996,
KDD.
[5]
Susan T. Dumais,et al.
Similarity Measures for Short Segments of Text
,
2007,
ECIR.
[6]
Richard A. Harshman,et al.
Indexing by Latent Semantic Analysis
,
1990,
J. Am. Soc. Inf. Sci..
[7]
Thomas Hofmann,et al.
Probabilistic Latent Semantic Analysis
,
1999,
UAI.
[8]
Benjamin Rey,et al.
Generating query substitutions
,
2006,
WWW '06.