Optimal-location queries over spatial databases

We study the optimal-location queries in spatial databases. Given a set S of sites, a set O of objects, and a spatial region Q, the optimal-location query returns a location in Q such that, if a new site is put at that location, the total benefit to the objects is maximized. The optimal-location queries are of interest in many applications such as corporation decision-support systems. There may be many variations of the optimal location because different applications may define the optimality differently. In this thesis, we consider the geometric proximity between objects and sites and study two most intuitive definitions of the optimality, namely Max-Inf and Min-Dist. Max-Inf measures the total benefit as the number of objects closer to the new site than to any existing sites and aims to maximize it. Min-Dist measures the benefit as the savings of the distance from objects to their nearest sites and aims to minimize the average distance from each object to its nearest site. We also examine the problem under three different distance metrics, namely L1, L2 and network shortest path. We propose efficient solutions for each optimal location query (6 in total) and evaluate them experimentally. We expect that the work will not only advance spatial database research, but also benefit the end users who deal with a large volume of spatial data.

[1]  D. T. Lee,et al.  Two-Dimensional Voronoi Diagrams in the Lp-Metric , 1980, J. ACM.

[2]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[3]  Adam Meyerson,et al.  A k-Median Algorithm with Running Time Independent of Data Size , 2004, Machine Learning.

[4]  Michiel Smid,et al.  Closest-Point Problems in Computational Geometry , 2000, Handbook of Computational Geometry.

[5]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[6]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[7]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[8]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[9]  Xiaohui Yu,et al.  Monitoring k-nearest neighbor queries over moving objects , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Amin Saberi,et al.  A new greedy approach for facility location problems , 2002, STOC '02.

[12]  Yufei Tao,et al.  Continuous Nearest Neighbor Search , 2002, VLDB.

[13]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[14]  C. Greg Plaxton,et al.  The Online Median Problem , 1999, SIAM J. Comput..

[15]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[16]  Leonard Pitt,et al.  Sublinear time approximate clustering , 2001, SODA '01.

[17]  Lien Fa Lin,et al.  Continuous nearest neighbor search , 2008 .

[18]  Sudipto Guha,et al.  Rounding via Trees : Deterministic Approximation Algorithms forGroup , 1998 .

[19]  David B. Shmoys,et al.  Approximation algorithms for facility location problems , 2000, APPROX.

[20]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[21]  Evangelos Markakis,et al.  A Greedy Facility Location Algorithm Analyzed Using Dual Fitting , 2001, RANDOM-APPROX.

[22]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[23]  Maxim Sviridenko An Improved Approximation Algorithm for the Metric Uncapacitated Facility Location Problem , 2002, IPCO.

[24]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[25]  Mohammad Mahdian,et al.  Improved Approximation Algorithms for Metric Facility Location Problems , 2002, APPROX.

[26]  An A Fabii,et al.  Improved Approximation Algorithms for Uncapacitated Facility Location , 1998 .

[27]  C. Greg Plaxton,et al.  Optimal Time Bounds for Approximate Clustering , 2002, Machine Learning.

[28]  Tian Xia,et al.  Improving the R*-tree with outlier handling techniques , 2005, GIS '05.

[29]  Satish Rao,et al.  A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem , 2007, SIAM J. Comput..

[30]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[31]  Sudipto Guha,et al.  Improved combinatorial algorithms for the facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[32]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[33]  Vijay V. Vazirani,et al.  Primal-dual approximation algorithms for metric facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[34]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[35]  Kyriakos Mouratidis,et al.  Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring , 2005, SIGMOD '05.

[36]  Jeffrey Scott Vitter,et al.  Approximation Algorithms for Geometric Median Problems , 1992, Inf. Process. Lett..

[37]  Rajmohan Rajaraman,et al.  Analysis of a local search heuristic for facility location problems , 2000, SODA '98.

[38]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[39]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[40]  Shashi Shekhar,et al.  CCAM: A Connectivity-Clustered Access Method for Networks and Network Computations , 1997, IEEE Trans. Knowl. Data Eng..

[41]  Dorit S. Hochbaum,et al.  Heuristics for the fixed cost median problem , 1982, Math. Program..

[42]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[43]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[44]  Walid G. Aref,et al.  SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[45]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[46]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[47]  Carsten Lund,et al.  Efficient probabilistically checkable proofs and applications to approximations , 1993, STOC.

[48]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[49]  Torben Bach Pedersen,et al.  Nearest neighbor queries in road networks , 2003, GIS '03.

[50]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[51]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[52]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[53]  Jiawei Zhang,et al.  Approximation algorithms for facility location problems , 2004 .

[54]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.