Semi Supervised Adaptive Framework for Classifying Evolving Data Stream

Most of the approaches for classifying evolving data stream divide the stream into fixed size chunks to address infinite length and concept drift problems. These approaches suffer from trade-off between performance and sensitivity. To address this problem, existing adaptive sliding window techniques determine chunk boundaries dynamically by detecting changes in classifier error rate which requires true labels for all of the data instances. However, true labels are scarce and often delayed in reality. In this paper, we propose an approach which determines dynamic chunk boundaries by detecting significant changes in classifier confidence scores using only limited number of labeled data instances. Moreover, we integrate suitable classification technique with it to propose a complete semi supervised framework which uses dynamic chunk boundaries to address concept drift and concept evolution efficiently. Results from the experiments using benchmark data sets show the effectiveness of our proposed framework in terms of handling both concept drift and concept evolution.

[1]  Sanjay Ranka,et al.  Statistical change detection for multi-dimensional data , 2007, KDD '07.

[2]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[3]  Shie Mannor,et al.  Concept Drift Detection Through Resampling , 2014, ICML.

[4]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Tony R. Martinez,et al.  Using multiple measures to predict confidence in instance classification , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[6]  Latifur Khan,et al.  Incremental Ensemble Classifier Addressing Non-stationary Fast Data Streams , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[7]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[8]  Ludmila I. Kuncheva,et al.  PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[10]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[11]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Philip S. Yu,et al.  A framework for on-demand classification of evolving data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[16]  Philip S. Yu,et al.  On Classification of High-Cardinality Data Streams , 2010, SDM.

[17]  Ivan Koychev,et al.  Tracking Changing User Interests through Prior-Learning of Context , 2002, AH.

[18]  Latifur Khan,et al.  Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams , 2015, AAAI.