Communication-Efficient Distributed Multiple Reference Pattern Matching for M2M Systems

In M2M applications, it is very common to encounter the ad hoc snapshot query that requires fast responses from many local machines in which all the data are distributed. In the scenario when the query is more complex, the communication cost for sending it to all the local machines for processing can be very high. This paper aims to address this issue. Given a reference set of multiple and large-size patterns, we propose an approach to identifying its k nearest and farthest neighbors globally across all the local machines. By decomposing the reference patterns into a multi-resolution representation and using novel distance bound designs, our method guarantees the exact results in a communication-efficient manner. Analytical and empirical studies show that our method outperforms the state-of-the-art methods in saving significant bandwidth usage, especially for large numbers of machines and large-sized reference patterns.

[1]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[2]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[3]  Yanggon Kim,et al.  A Fast Multiple String-Pattern Matching Algorithm , 1999 .

[4]  Xiaoyan Liu,et al.  Efficient k-NN Search on Streaming Data Series , 2003, SSTD.

[5]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[6]  Yannis Manolopoulos,et al.  Distributed Processing of Similarity Queries , 2004, Distributed and Parallel Databases.

[7]  Beng Chin Ooi,et al.  Approximate NN queries on Streams with Guaranteed Error/performance Bounds , 2004, VLDB.

[8]  Ming-Syan Chen,et al.  Efficient range-constrained similarity search on wavelet synopses over multiple streams , 2006, CIKM '06.

[9]  Matthias Klusch,et al.  Privacy Preserving Pattern Discovery in Distributed Time Series , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[10]  Lisa Singh,et al.  Privacy Preserving Burst Detection of Distributed Time Series Data Using Linear Transforms , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[11]  Mi-Yen Yeh,et al.  Level-Wise Distribution of Wavelet Coefficients for Processing k NN Queries over Distributed Streams , 2008 .

[12]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[13]  Jignesh M. Patel,et al.  SigMatch: Fast and Scalable Multi-Pattern Matching , 2010, Proc. VLDB Endow..

[14]  Panagiotis Karras,et al.  Scalable kNN search on vertically stored time series , 2011, KDD.

[15]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.