Distributed Monitoring of Frequent Items

Monitoring frequently occuring items is a recurring task in a variety of applications. Although a number of solutions have been proposed there has been few to address the problem in a distributed networked environment. Most past solutions relied upon approximating results to lower communication overhead. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naive approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.

[1]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[2]  Rade Stanojević,et al.  Scalable heavy-hitter identification , 2007 .

[3]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[4]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[5]  George N. Rouskas,et al.  Networking 2004 , 2004, Lecture Notes in Computer Science.

[6]  Srinivasan Seshan,et al.  Detecting DDoS Attacks on ISP Networks , 2003 .

[7]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[8]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[9]  Vyas Sekar,et al.  LADS: Large-scale Automated DDoS Detection System , 2006, USENIX Annual Technical Conference, General Track.

[10]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[11]  Robert Fuller,et al.  FIDS: Monitoring Frequent Items over Distributed Data Streams , 2007, MLDM.

[12]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[13]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[16]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[17]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[18]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[19]  Kotagiri Ramamohanarao,et al.  Proactively Detecting Distributed Denial of Service Attacks Using Source IP Address Monitoring , 2004, NETWORKING.

[20]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[21]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.