Unveiling locations in geo-spatial documents

Resolving geo-identities of addresses in emerging economies where users rely primarily on short messaging as the means of querying, poses several daunting challenges: lack of proper addressing schemes, non-availability of cartographic information and non-standardized nomenclature of geo-spatial entities such as streets and avenues, to name a few. In this work, we propose a simple and elegant approach to solve this problem for emerging economies. By treating address texts as short documents and exploiting latent proximity information contained in them --- for example, landmark like references, similarity of address texts etc --- we transform the problem of resolving geo-identity to a search problem on short annotated geo-spatial documents, collected through extensive survey of six cities in India. Our solution spans all the phases of building a geo-identity resolution system, even though our emphasis is on the collection and organization of the corpus to facilitate a search engine backend for the task. Through experimentation based on a representative test set collected from the real world, we demonstrate how this approach achieves over 94% accuracy in resolution and an order of magnitude reduction in system state (memory) with nearly zero false-negatives - a significant improvement over the state of the art in emerging markets.