What's new: finding significant differences in network data streams

Monitoring and analyzing network traffic usage patterns is vital for managing IP Networks. An important problem is to provide network managers with information about changes in traffic, informing them about "what's new". Specifically, we focus on the challenge of finding significantly large differences in traffic: over time, between interfaces and between routers. We introduce the idea of a deltoid: an item that has a large difference, whether the difference is absolute, relative or variational. We present novel algorithms for finding the most significant deltoids in high-speed traffic data, and prove that they use small space, very small time per update, and are guaranteed to find significant deltoids with pre-specified accuracy. In experimental evaluation with real network traffic, our algorithms perform well and recover almost all deltoids. This is the first work to provide solutions capable of working over the data with one pass, at network traffic speeds.

[1]  Vinod Yegneswaran,et al.  Internet intrusions: global characteristics and prevalence , 2003, SIGMETRICS '03.

[2]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[3]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[4]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[5]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[6]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[7]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[8]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[9]  Piotr Indyk,et al.  Comparing Data Streams Using Hamming Norms (How to Zero In) , 2002, VLDB.

[10]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[11]  Anna C. Gilbert,et al.  QuickSAND: Quick Summary and Analysis of Network Data , 2001 .

[12]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[13]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[14]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[15]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[17]  Denise A. Troll What's hot and what's not , 1995 .

[18]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[19]  Graham Cormode,et al.  Estimating Dominance Norms of Multiple Data Streams , 2003, ESA.

[20]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[21]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[22]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[23]  ShenkerScott,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003 .

[24]  EstanCristian,et al.  Bitmap algorithms for counting active flows on high-speed links , 2006 .

[25]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[26]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[27]  S. Muthukrishnan,et al.  Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.

[28]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[29]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[30]  ViswanathanMahesh,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2003 .

[31]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[32]  Rajeev Motwani,et al.  Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.

[33]  Monika Henzinger,et al.  Algorithmic Challenges in Web Search Engines , 2004, Internet Math..

[34]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[35]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.