BurstSketch: Finding Bursts in Data Streams

Burst is a common pattern in data streams which is characterized by a sudden increase in terms of arrival rate followed by a sudden decrease. Burst detection has attracted extensive attention from the research community. In this paper, we propose a novel sketch, namely BurstSketch, to detect bursts accurately in real time. BurstSketch first uses the technique Running Track to select potential burst items efficiently, and then monitors the potential burst items and capture the key features of burst pattern by a technique called Snapshotting. Experimental results show that our sketch achieves a 1.75 times higher recall rate than the strawman solution.

[1]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[2]  Tong Yang,et al.  WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams , 2020, KDD.

[3]  Minlan Yu,et al.  Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing , 2018, SIGMOD Conference.

[4]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[5]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Tong Yang,et al.  Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams , 2017, Proc. VLDB Endow..

[7]  Xinyu Wang,et al.  Real-time intelligent big data processing: technology, platform, and applications , 2019, Science China Information Sciences.

[8]  Satoshi Matsuoka,et al.  Scaling Word2Vec on Big Corpus , 2019, Data Science and Engineering.

[9]  Yi Wang,et al.  Detecting Lasting and Abrupt Bursts in Data Streams Using Two-Layered Wavelet Tree , 2006, Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services (AICT-ICIW'06).

[10]  Gaogang Xie,et al.  A Shifting Bloom Filter Framework for Set Queries , 2015, Proc. VLDB Endow..

[11]  Tong Yang,et al.  SketchML: Accelerating Distributed Machine Learning with Data Sketches , 2018, SIGMOD Conference.

[12]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[13]  Peng Liu,et al.  Elastic sketch: adaptive and fast network-wide measurements , 2018, SIGCOMM.

[14]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Yuanming Zhang,et al.  A Memory-Efficient Sketch Method for Estimating High Similarities in Streaming Sets , 2019, KDD.

[16]  Lei Zou,et al.  HeavyGuardian: Separate and Guard Hot Items in Data Streams , 2018, KDD.

[17]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[18]  Haipeng Dai,et al.  Finding Persistent Items in Data Streams , 2016, Proc. VLDB Endow..

[19]  Chen Qian,et al.  Vacuum Filters: More Space-Efficient and Faster Replacement for Bloom and Cuckoo Filters , 2019, Proc. VLDB Endow..

[20]  Daniel Ting,et al.  Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation , 2017, SIGMOD Conference.

[21]  Yan Jia,et al.  Online Burst Detection Over High Speed Short Text Streams , 2007, International Conference on Computational Science.

[22]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[23]  Xiaoyong Du,et al.  Persistent Data Sketching , 2015, SIGMOD Conference.

[24]  Aoying Zhou,et al.  Adaptively Detecting Aggregation Bursts in Data Streams , 2005, DASFAA.

[25]  Yanqing Peng,et al.  Bursty Event Detection Throughout Histories , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[26]  Maciej Zakrzewicz,et al.  Prediction-based load shedding for burst data streams , 2011, Bell Labs Technical Journal.

[27]  Junping Du,et al.  Burst Topic Detection in Real Time Spatial–Temporal Data Stream , 2019, IEEE Access.

[28]  Michael A. Bender,et al.  A General-Purpose Counting Filter: Making Every Bit Count , 2017, SIGMOD Conference.

[29]  Feifei Li,et al.  Persistent Bloom Filter: Membership Testing for the Entire History , 2018, SIGMOD Conference.

[30]  Gustavo Alonso,et al.  Augmented Sketch: Faster and More Accurate Stream Processing , 2016, SIGMOD Conference.

[31]  Minlan Yu,et al.  FlowRadar: A Better NetFlow for Data Centers , 2016, NSDI.

[32]  Yiguang Hong,et al.  Distributed regression estimation with incomplete data in multi-agent networks , 2018, Science China Information Sciences.

[33]  Daniel Ting,et al.  Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions , 2018, KDD.

[34]  Roy Friedman,et al.  Randomized admission policy for efficient top-k and frequency estimation , 2016, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[35]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[36]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[37]  Kenji Nakamura,et al.  A Real-Time Burst Detection Method , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[38]  Nuwan Jayasena,et al.  Morton Filters: Faster, Space-Efficient Cuckoo Filters via Biasing, Compression, and Decoupled Logical Sparsity , 2018, Proc. VLDB Endow..

[39]  Tong Yang,et al.  Out of Many We are One: Measuring Item Batch with Clock-Sketch , 2021, SIGMOD Conference.

[40]  Graham Cormode,et al.  Sketch Techniques for Approximate Query Processing , 2010 .

[41]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[42]  Arnd Christian König,et al.  Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams , 2016, SIGMOD Conference.

[43]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[44]  Vatsal Sharan,et al.  Sketching Linear Classifiers over Data Streams , 2017, SIGMOD Conference.

[45]  A. Stephen McGough,et al.  Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study , 2018, Data Science and Engineering.

[46]  Yi Wang,et al.  LightGuardian: A Full-Visibility, Lightweight, In-band Telemetry System Using Sketchlets , 2021, NSDI.

[47]  Qing Chen,et al.  Graph Stream Summarization: From Big Bang to Big Crunch , 2016, SIGMOD Conference.