Detection of Concept Drift for Learning from Stream Data

In data processing under dynamic environment such as stream, the time is one of the most significant facts not only because the size of data is dramatically increased but also because the context of data could be varied over time. To learn effectively from dynamic data evolving over time, it is required to detect the drift of the concept of data. We present a method to detect it by utilizing the correlation information of value distribution and apply our method to a learning task on a multi-stream data model. The result of experiments on a synthetic data set shows that our approach could provide a reasonable threshold to detect the change between windowed batches of stream data.

[1]  João Gama,et al.  Learning with Local Drift Detection , 2006, ADMA.

[2]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2004, Theory of Computing Systems.

[3]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[4]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[5]  Hila Becker,et al.  Real-time ranking with concept drift using expert advice , 2007, KDD '07.

[6]  Rabab Kreidieh Ward,et al.  Data transmission schemes for DVD-like interactive TV , 2006, IEEE Transactions on Multimedia.

[7]  Fábio Santos da Silva,et al.  CollaboraTVware: a context-aware infrastructure with support for collaborative participation in an interactive digital TV environment , 2010, Int. J. Adv. Media Commun..

[8]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Uwe Petersohn,et al.  Approaching Concept Drift by Context Feature Partitioning , 2012 .

[10]  Mykola Pechenizkiy,et al.  Handling outliers and concept drift in online mass flow prediction in CFB boilers , 2009, SensorKDD '09.

[11]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[12]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[13]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[14]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[15]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.