A Single-Pass Online Data Mining Algorithm Combined with Control Theory with Limited Memory in Dynamic Data Streams

This paper addresses a fundamental problem that arises in data streaming scenarios, namely, today’s data mining is ill-equipped to handle data streams effectively, and pays little attention to the network stability and the fast response [36]. To the question, we present a control-theoretic explicit rate (ER) online data mining control algorithm (ODMCA) to regulate the sending rate of mined data, which accounts for the main memory occupancies of terminal nodes. The proposed method uses a distributed proportional integrative plus derivative controller combined with data-mining, where the control parameters can be designed to ensure the stability of the control loop in terms of sending rate of mined data. We further analyze the theoretical aspects of the proposed algorithm, and simulation results show the efficiency of our scheme in terms of high main memory occupancy, fast response of the main memory occupancy as well as of the controlled sending rates.

[1]  Mahesh Viswanathan,et al.  Testing and spot-checking of data streams (extended abstract) , 2000, ACM-SIAM Symposium on Discrete Algorithms.

[2]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[3]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[4]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[5]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[6]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  S. Muthukrishnan,et al.  How to Summarize the Universe: Dynamic Maintenance of Quantiles , 2002, VLDB.

[8]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[9]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[10]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[11]  Joseph L. Hellerstein,et al.  Using Control Theory to Achieve Service Level Objectives In Performance Management , 2002, Real-Time Systems.

[12]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[13]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[14]  Naixue Xiong,et al.  Scalable parameter tuning for AVQ , 2005, IEEE Communications Letters.

[15]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[16]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[17]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[18]  Rakesh Agrawal,et al.  A One-Pass Space-Efficient Algorithm for Finding Quantiles , 1995, COMAD.

[19]  Xi Zhang,et al.  Scalable flow control for multicast ABR services in ATM networks , 2002, TNET.

[20]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[21]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[22]  Naixue Xiong,et al.  Data Transmission Rate Control in Computer Networks Using Neural Predictive Networks , 2004, ISPA.

[23]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[25]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[26]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[27]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[28]  Xi Zhang,et al.  Scalable flow control for multicast ABR services , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[29]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[30]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[31]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[32]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[33]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[34]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[35]  Suh-Yin Lee,et al.  Single-pass algorithms for mining frequency change patterns with limited space in evolving append-only and dynamic transaction data streams , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[36]  Marcel Waldvogel,et al.  A rate-based end-to-end multicast congestion control protocol , 2000, Proceedings ISCC 2000. Fifth IEEE Symposium on Computers and Communications.

[37]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[38]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[39]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[40]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[41]  B. Ross Barmish,et al.  New Tools for Robustness of Linear Systems , 1993 .

[42]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[43]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[44]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[45]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[46]  Mahesh Viswanathan,et al.  Testing and Spot-Checking of Data Streams , 2000, SODA '00.