Querying Temporal Drifts at Multiple Granularities

There exists a large body of work on online drift detection with the goal of dynamically finding and maintaining changes in data streams. In this paper, we adopt a query-based approach to drift detection. Our approach relies on a drift index, a structure that captures drift at different time granularities and enables flexible drift queries. We formalize different drift queries that represent real-world scenarios and develop query evaluation algorithms that use different materializations of the drift index as well as strategies for online index maintenance. We describe a thorough study of the performance of our algorithms on real-world and synthetic datasets with varying change rates.

[1]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[7]  João Gama,et al.  Change Detection in Learning Histograms from Data Streams , 2007, EPIA Workshops.

[8]  Abraham Bernstein,et al.  Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[9]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Grigorios Tsoumakas,et al.  On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams , 2005, Panhellenic Conference on Informatics.

[11]  Anton Dries,et al.  Adaptive concept drift detection , 2009, SDM.

[12]  Albert Bifet,et al.  Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , 2010, Frontiers in Artificial Intelligence and Applications.

[13]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[14]  Marie-Luce Picard,et al.  Density estimation on data stream : an application to change detection , 2010, EGC.

[15]  Grigorios Tsoumakas,et al.  Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams , 2006 .

[16]  S. Venkatasubramanian,et al.  An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams , 2006 .

[17]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[18]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[19]  Bhavani M. Thuraisingham,et al.  A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams , 2009, PAKDD.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.