Tracking the Intrinsic Dimension of Evolving Data Streams to Update Association Rules

Data streams can change their behavior over time and, when a significant change occurs, the rules governing the attributes reported by each event can also change. Moreover, data streams can be composed of events from several classes, and the rules governing the events of each class can also change depending on actual properties of the data. In this paper we propose a new technique to continuously identify which are the most relevant attributes to characterize each class, based on the general properties exhibited by the data stream as it evolves over time.

[1]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[2]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[3]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[4]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[5]  Christos Faloutsos,et al.  Evaluating the intrinsic dimension of evolving data streams , 2006, SAC '06.

[6]  Christos Faloutsos,et al.  Adaptive, unsupervised stream mining , 2004, The VLDB Journal.

[7]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[8]  Jesús S. Aguilar-Ruiz,et al.  Incremental rule learning based on example nearness from numerical data streams , 2005, SAC '05.

[9]  Christos Faloutsos,et al.  Fast feature selection using fractal dimension , 2010, J. Inf. Data Manag..

[10]  Ping Chen,et al.  Using Self-Similarity to Cluster Large Data Sets , 2003, Data Mining and Knowledge Discovery.

[11]  Christos Faloutsos,et al.  F4: large-scale automated forecasting using fractals , 2002, CIKM '02.

[12]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[13]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[14]  Ping Chen,et al.  Self-Similar Mining of Time Association Rules , 2004, PAKDD.

[15]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[17]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[18]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[19]  Christos Faloutsos,et al.  Tri-plots: scalable tools for multidimensional data mining , 2001, KDD '01.

[20]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[21]  João Gama,et al.  Clustering of Time-Series Data Streams , 2005 .

[22]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[23]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[24]  Pedram Sadeghian,et al.  The time diversification monitoring of a stock portfolio: an approach based on the fractal dimension , 2004, SAC '04.