Detecting anomaly in data streams by fractal model

Detecting anomaly in data streams attracts great attention in both academic and industry communities due to its wide range application in venture analysis, network monitoring, trend analysis and so on. However, existing methods on anomaly detection suffer three problems. 1) A large number of false positive results are generated. 2) Training data are needed to build the detection model, and an appropriate time window size along with corresponding threshold has to be set empirically. 3) Both time and space overhead is usually very high. To address these limitations. We propose a fractal-model-based approach to detection of anomalies that change underlying data distribution in this paper. Both a history-based algorithm and a parameter-free algorithm are introduced. We show that the later method consumes only limited memory and does not involve any training process. Theoretical analyses of the algorithm are presented. The experimental results on real life data sets indicate that, compared with existing anomaly detection methods, our algorithm can achieve higher precision with less space and time complexity.

[1]  Edward R. Vrscay,et al.  Solving the inverse problem for measures using iterated function systems: a new approach , 1995, Advances in Applied Probability.

[2]  Andrew Heybey,et al.  Tribeca: A System for Managing Large Databases of Network Traffic , 1998, USENIX Annual Technical Conference.

[3]  Jeffrey Scott Vitter,et al.  Mining deviants in time series data streams , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[5]  Lynda L. McGhie,et al.  World Wide Web , 2011, Encyclopedia of Information Assurance.

[6]  Niclas Wadströmer An automatization of Barnsley's algorithm for the inverse problem of iterated function systems , 2003, IEEE Trans. Image Process..

[7]  Aoying Zhou,et al.  Approximately Processing Multi-granularity Aggregate Queries over Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Edward R. Vrscay,et al.  On the Inverse Problem of Fractal Compression , 2001 .

[9]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[10]  Aoying Zhou,et al.  Adaptively Detecting Aggregation Bursts in Data Streams , 2005, DASFAA.

[11]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[12]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[13]  M. Barnsley,et al.  Iterated function systems and the global construction of fractals , 1985, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[14]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[15]  P.-O. Amblard,et al.  Stochastic discrete scale invariance , 2002, IEEE Signal Processing Letters.

[16]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[17]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[18]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[19]  Joseph O'Rourke,et al.  An on-line algorithm for fitting straight lines between data ranges , 1981, CACM.

[20]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[21]  Ambuj K. Singh,et al.  A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[23]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[24]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[25]  Michael F. Barnsley,et al.  Fractal functions and interpolation , 1986 .

[26]  M. Barnsley,et al.  Recurrent iterated function systems , 1989 .

[27]  John C. Hart Fractal Image Compression and the Inverse Problem of Recurrent Iterated Function Systems , 1996 .

[28]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[29]  John C. Hart Fractal image compression and recurrent iterated function systems , 1996, IEEE Computer Graphics and Applications.

[30]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[31]  Suman K. Mitra,et al.  Fractal image compression using iterated function system with probabilities , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[32]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[33]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[34]  Monson H. Hayes,et al.  Using iterated function systems to model discrete sequences , 1992, IEEE Trans. Signal Process..