On processing continuous frequent K-N-match queries for dynamic data over networked data sources

Similarity search is one of the critical issues in many applications. When using all attributes of objects to determine their similarity, most prior similarity search algorithms are easily influenced by a few attributes with high dissimilarity. The frequent k-n-match query is proposed to overcome the above problem. However, the prior algorithm to process frequent k-n-match queries is designed for static data, whose attributes are fixed, and is not suitable for dynamic data. Thus, we propose in this paper two schemes to process continuous frequent k-n-match queries over dynamic data. First, the concept of safe region is proposed and four formulae are devised to compute safe regions. Then, scheme CFKNMatchAD-C is developed to speed up the process of continuous frequent k-n-match queries by utilizing safe regions to avoid unnecessary query re-evaluations. To reduce the amount of data transmitted by networked data sources, scheme CFKNMatchAD-C also uses safe regions to eliminate transmissions of unnecessary data updates which will not affect the results of queries. Moreover, for large-scale environments, we further propose scheme CFKNMatchAD-D by extending scheme CFKMatchAD-C to employ multiple servers to process continuous frequent k-n-match queries. Experimental results show that scheme CFKNMatchAD-C and scheme CFKNMatchAD-D outperform the prior algorithm in terms of average response time and the amount of produced network traffic.

[1]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[2]  Philip S. Yu,et al.  The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space , 2000, KDD '00.

[3]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[4]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[5]  Beng Chin Ooi,et al.  Approximate NN queries on Streams with Guaranteed Error/performance Bounds , 2004, VLDB.

[6]  Lars Kulik,et al.  The V*-Diagram: a query-dependent approach to moving KNN queries , 2008, Proc. VLDB Endow..

[7]  Kamesh Munagala,et al.  Energy-efficient monitoring of extreme values in sensor networks , 2006, SIGMOD Conference.

[8]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[11]  Jianliang Xu,et al.  A generic framework for monitoring continuous spatial queries over moving objects , 2005, SIGMOD '05.

[12]  Xiang Lian,et al.  Similarity Search in Arbitrary Subspaces Under Lp-Norm , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[14]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[15]  Mohamed S. Kamel,et al.  Document Similarity Using a Phrase Indexing Graph Model , 2003, Knowledge and Information Systems.

[16]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[17]  Shengrui Wang,et al.  A general measure of similarity for categorical sequences , 2009, Knowledge and Information Systems.

[18]  Hans-Peter Kriegel,et al.  Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces , 2010, SSDBM.

[19]  Ming Zhang,et al.  Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space , 2010, Knowledge and Information Systems.

[20]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[21]  Beng Chin Ooi,et al.  Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach , 2008, The VLDB Journal.

[22]  Nick Roussopoulos,et al.  K-Nearest Neighbor Search for Moving Query Point , 2001, SSTD.

[23]  Jianliang Xu,et al.  Top-k Monitoring in Wireless Sensor Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[25]  Krithi Ramamritham,et al.  An Efficient and Resilient Approach to Filtering and Disseminating Streaming Data , 2003, VLDB.

[26]  Albert Y. Zomaya,et al.  A Decentralized Method for Scaling Up Genome Similarity Search Services , 2009, IEEE Transactions on Parallel and Distributed Systems.

[27]  Zhengrong Yao,et al.  Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching , 2002, CIKM '02.

[28]  Hans-Peter Kriegel,et al.  Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[29]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[30]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[31]  Walid G. Aref,et al.  Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects , 2002, IEEE Trans. Computers.

[32]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[33]  Gang Liu,et al.  Short text similarity based on probabilistic topics , 2009, Knowledge and Information Systems.

[34]  Yan Jia,et al.  Supporting Efficient Distributed Top-k Monitoring , 2006, WAIM.

[35]  Anthony K. H. Tung,et al.  Similarity search: a matching based approach , 2006, VLDB.

[36]  Jae Soo Yoo,et al.  PRIM: Priority-Based Top-k Monitoring in Wireless Sensor Networks , 2008, International Symposium on Computer Science and its Applications.

[37]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.