Finding Critical Thresholds for Defining Bursts in Event Logs

A burst, i.e., an unusally high frequency of occurrence of an event in a time-window, is interesting in many monitoring applications that give rise to temporal data as it often indicates an abnormal activity. While the problem of detecting bursts from time-series data has been well addressed, the question of what choice of thresholds, on the number of events as well as on the window size, makes a window “unusally bursty” remains a relevant one. We consider the problem of finding critical values of both these thresholds. Since for most applications, we hardly have any apriori idea of what combination of thresholds is critical, the range of possible values for either threshold can be very large. We formulate finding the combination of critical thresholds as a two-dimensional search problem and design efficient deteministic and randomized divide-and-conquer heuristics. For the deterministic heuristic, we show that under some weak assumptions, the computational overhead is logarithmic in the sizes of the ranges. Under identical assumptions, the expected computational overhead of the randomized heuristic in the worst case is also logarithmic. Using data obtained from logs of medical equipment, we conduct extensive simulations that reinforce our theoretical results, and show that on average, the randomized heuristic beats its deteministic counterpart in practice.

[1]  Divesh Srivastava,et al.  What's on the grapevine? , 2009, SIGMOD Conference.

[2]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[3]  Zhi-Li Zhang,et al.  Reducing Unwanted Traffic in a Backbone Network , 2005, SRUTI.

[4]  Walter Willinger,et al.  Analysis, modeling and generation of self-similar VBR video traffic , 1994, SIGCOMM.

[5]  Translational Bounds for Factorial n and the Factorial Polynomial , 2009 .

[6]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[7]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[8]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[10]  Alfredo Cuzzocrea,et al.  CAMS: OLAPing Multidimensional Data Streams Efficiently , 2009, DaWaK.

[11]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[13]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[15]  Sharma Chakravarthy,et al.  Event-based lossy compression for effective and efficient OLAP over data streams , 2010, Data Knowl. Eng..

[16]  Yong Guan,et al.  Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[17]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[18]  Alfredo Cuzzocrea Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams , 2011, SSDBM.

[19]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[20]  Yan Jia,et al.  Online Burst Detection Over High Speed Short Text Streams , 2007, International Conference on Computational Science.

[21]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[22]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[23]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[24]  Yan Jia,et al.  Counting Data Stream Based on Improved Counting Bloom Filter , 2008, 2008 The Ninth International Conference on Web-Age Information Management.