Correlating burst events on streaming stock market data

We address the problem of monitoring and identification of correlated burst patterns in multi-stream time series databases. We follow a two-step methodology: first we identify the burst sections in our data and subsequently we store them for easy retrieval in an efficient in-memory index. The burst detection scheme imposes a variable threshold on the examined data and takes advantage of the skewed distribution that is typically encountered in many applications. The detected bursts are compacted into burst intervals and stored in an interval index. The index facilitates the identification of correlated bursts by performing very efficient overlap operations on the stored burst regions. We present the merits of the proposed indexing scheme through a thorough analysis of its complexity. We also manifest the real-time response of our burst indexing technique, and demonstrate the usefulness of the approach for correlating surprising volume trading events using historical stock data of the NY stock exchange. While the focus of this work is on financial data, the proposed methods and data-structures can find applications for anomaly or novelty detection in telecommunication, network traffic and medical data.

[1]  M. Harries Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning , 1995 .

[2]  C.C.A.M. Gielen,et al.  Principal Component Analysis and Gabortransform in analysing burst-suppression EEG under propofol anaesthesia , 2001 .

[3]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  Dennis Shasha,et al.  The Virtues and Challenges of Ad Hoc + Streams Querying in Finance , 2003, IEEE Data Eng. Bull..

[6]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[7]  A. Moore,et al.  Wsare: What’s strange about recent events? , 2003, Journal of Urban Health.

[8]  Antonio Turiel,et al.  Multifractal geometry in stock market time series , 2003 .

[9]  L. Stern,et al.  Automated outbreak detection: a quantitative retrospective analysis , 1999, Epidemiology and Infection.

[10]  Arnold Bosman,et al.  Automated, Laboratory-based System Using the Internet for Disease Outbreak Detection, the Netherlands , 2003, Emerging infectious diseases.

[11]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[12]  Theodore Johnson,et al.  Selection Predicate Indexing for Active Databases Using Interval Skip Lists , 1996, Inf. Syst..

[13]  Thomas Lux,et al.  Long-term stochastic dependence in financial prices: evidence from the German stock market , 1996 .

[14]  Graham Cormode,et al.  Summarizing and Mining Skewed Data Streams , 2005, SDM.

[15]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[16]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[17]  E. Friis-christensen,et al.  Length of the Solar Cycle: An Indicator of Solar Activity Closely Associated with Climate , 1991, Science.

[18]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[19]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[20]  Bing Liu,et al.  Measuring the meaning in time series clustering of text search queries , 2006, CIKM '06.

[21]  A Min Tjoa,et al.  Grid-based Mobile Phone Fraud Detection System , 2005 .

[22]  Steven L. Scott,et al.  A Bayesian paradigm for designing intrusion detection systems , 2004, Computational Statistics & Data Analysis.

[23]  N. Thakor,et al.  Higher-order spectral analysis of burst patterns in EEG , 1999, IEEE Transactions on Biomedical Engineering.

[24]  Philip S. Yu,et al.  Interval query indexing for efficient stream processing , 2004, CIKM '04.

[25]  Philip S. Yu,et al.  Fast Burst Correlation of Financial Data , 2005, PKDD.

[26]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).