Web data retrieval: solving spatial range queries using k-nearest neighbor searches

As Geographic Information Systems (GIS) technologies have evolved, more and more GIS applications and geospatial data are available on the web. Spatial objects in a given query range can be retrieved using spatial range query − one of the most widely used query types in GIS and spatial databases. However, it can be challenging to retrieve these data from various web applications where access to the data is only possible through restrictive web interfaces that support certain types of queries. A typical scenario is the existence of numerous business web sites that provide their branch locations through a limited “nearest location” web interface. For example, a chain restaurant’s web site such as McDonalds can be queried to find some of the closest locations of its branches to the user’s home address. However, even though the site has the location data of all restaurants in, for example, the state of California, it is difficult to retrieve the entire data set efficiently due to its restrictive web interface. Considering that k-Nearest Neighbor (k-NN) search is one of the most popular web interfaces in accessing spatial data on the web, this paper investigates the problem of retrieving geospatial data from the web for a given spatial range query using only k-NN searches. Based on the classification of k-NN interfaces on the web, we propose a set of range query algorithms to completely cover the rectangular shape of the query range (completeness) while minimizing the number of k-NN searches as possible (efficiency). We evaluated the efficiency of the proposed algorithms through statistical analysis and empirical experiments using both synthetic and real data sets.

[1]  Subbarao Kambhampati,et al.  Effectively mining and using coverage and overlap statistics for data integration , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Yeh-Ching Chung,et al.  A Delaunay Triangulation based method for wireless sensor network deployment , 2007, Comput. Commun..

[3]  Cyrus Shahabi,et al.  Supporting Range Queries on Web Data Using k-Nearest Neighbor Search , 2007, WebDB.

[4]  Cláudio T. Silva,et al.  Efficient Acquisition of Web Data through Restricted Query Interfaces , 2001, WWW Posters.

[5]  Thomas F. La Porta,et al.  Movement-assisted sensor deployment , 2004, IEEE INFOCOM 2004.

[6]  Miodrag Potkonjak,et al.  Worst and best-case coverage in sensor networks , 2005, IEEE Transactions on Mobile Computing.

[7]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[8]  Jeffrey D. Ullman,et al.  Computing capabilities of mediators , 1999, SIGMOD '99.

[9]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[10]  Cyrus Shahabi,et al.  Utilizing Voronoi Cells of Location Data Streams for Accurate Computation of Aggregate Functions in Sensor Networks , 2006, GeoInformatica.

[11]  Matthew Dickerson,et al.  Simple algorithms for enumerating interpoint distances and finding $k$ nearest neighbors , 1992, Int. J. Comput. Geom. Appl..

[12]  David Eppstein,et al.  Iterated nearest neighbors and finding minimal polytopes , 1993, SODA '93.

[13]  Ee-Peng Lim,et al.  Efficient k nearest neighbor queries on remote spatial databases using range estimation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[14]  Hieu Le Quang,et al.  Integration of Web Data Sources: A Survey of Existing Problems , 2005, Grundlagen von Datenbanken.

[15]  Yufei Tao,et al.  An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[17]  L. Paul Chew,et al.  Constrained Delaunay triangulations , 1987, SCG '87.

[18]  Shaowen Wang,et al.  A quadtree approach to domain decomposition for spatial interpolation in Grid computing environments , 2003, Parallel Comput..

[19]  Yi-Shin Chen,et al.  TheaterLoc: using information integration technology to rapidly build virtual applications , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  Hanan Samet,et al.  Data structures for quadtree approximation and compression , 1985, CACM.

[21]  Nick Roussopoulos,et al.  K-Nearest Neighbor Search for Moving Query Point , 2001, SSTD.

[22]  Yeh-Ching Chung,et al.  A Delaunay triangulation based method for wireless sensor network deployment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).