Distributed Pattern Discovery in Multiple Streams

Given m groups of streams which consist of n1,...,nm co-evolving streams in each group, we want to: (i) incrementally find local patterns within a single group, (ii) efficiently obtain global patterns across groups, and more importantly, (iii) efficiently do that in real time while limiting shared information across groups. In this paper, we present a distributed, hierarchical algorithm addressing these problems. Our experimental case study confirms that the proposed method can perform hierarchical correlation detection efficiently and effectively.

[1]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[2]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[3]  Bin Yang,et al.  Projection approximation subspace tracking , 1995, IEEE Trans. Signal Process..

[4]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[5]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[6]  Ben Kao,et al.  Online Algorithms for Mining Inter-stream Associations from Large Sensor Networks , 2005, PAKDD.

[7]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[8]  Dimitrios Gunopulos,et al.  Correlating synchronous and asynchronous data streams , 2003, KDD '03.

[9]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[10]  Rong Chen,et al.  Distributed Web mining using Bayesian networks from multiple data streams , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[13]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[15]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[16]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[17]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[18]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[19]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[20]  Qiang Yang,et al.  An Incremental Subspace Learning Algorithm to Categorize Large Scale Text Data , 2005, APWeb.

[21]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[22]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[23]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.