Top-k most influential locations selection

We propose and study a new type of facility location selection query, the top-k most influential location selection query. Given a set M of customers and a set F of existing facilities, this query finds k locations from a set C of candidate locations with the largest influence values, where the influence of a candidate location c (c in C) is defined as the number of customers in M who are the reverse nearest neighbors of c. We first present a naive algorithm to process the query. However, the algorithm is computationally expensive and not scalable to large datasets. This motivates us to explore more efficient solutions. We propose two branch and bound algorithms, the Estimation Expanding Pruning (EEP) algorithm and the Bounding Influence Pruning (BIP) algorithm. These algorithms exploit various geometric properties to prune the search space, and thus achieve much better performance than that of the naive algorithm. Specifically, the EEP algorithm estimates the distances to the nearest existing facilities for the customers and the numbers of influenced customers for the candidate locations, and then gradually refines the estimation until the answer set is found, during which distance metric based pruning techniques are used to improve the refinement efficiency. BIP only estimates the numbers of influenced customers for the candidate locations. But it uses the existing facilities to limit the space for searching the influenced customers and achieve a better estimation, which results in an even more efficient algorithm. Extensive experiments conducted on both real and synthetic datasets validate the efficiency of the algorithms.

[1]  Jian Pei,et al.  Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jignesh M. Patel,et al.  Efficient Evaluation of All-Nearest-Neighbor Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Jignesh M. Patel,et al.  Effect of node size on the performance of cache-conscious B+-trees , 2003, SIGMETRICS '03.

[4]  Yunjun Gao,et al.  Optimal-Location-Selection Query Processing in Spatial Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[6]  Elke Achtert,et al.  Reverse k-nearest neighbor search in dynamic and general metric databases , 2009, EDBT '09.

[7]  Elisa Bertino,et al.  Continuous Intersection Joins Over Moving Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Stefan Langerman,et al.  Reverse facility location problems , 2005, CCCG.

[9]  Rui Zhang,et al.  Ranking locations for facility selection based on potential influences , 2011, IECON 2011 - 37th Annual Conference of the IEEE Industrial Electronics Society.

[10]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[11]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[13]  Yang Du,et al.  On Computing Top-t Most Influential Spatial Sites , 2005, VLDB.

[14]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[15]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[16]  Muhammad Aamir Cheema,et al.  Influence zone: Efficiently processing reverse k nearest neighbors queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[17]  Yufei Tao,et al.  Progressive computation of the min-dist optimal-location query , 2006, VLDB.

[18]  Yang Du,et al.  The Optimal-Location Query , 2005, SSTD.

[19]  Kyriakos Mouratidis,et al.  Medoid Queries in Large Spatial Databases , 2005, SSTD.

[20]  Thanasis Hadzilacos,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[21]  Rui Zhang,et al.  The HV-tree , 2010, Proc. VLDB Endow..

[22]  Wei Wu,et al.  Continuous Reverse k-Nearest-Neighbor Monitoring , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[23]  Philip S. Yu,et al.  Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor , 2009, Proc. VLDB Endow..