SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream

Most approaches to classifying data streams either divide the stream into fixed-size chunks or use gradual forgetting. Due to evolving nature of data streams, finding a proper size or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. These approaches hence suffer from a trade-off between performance and sensitivity. Existing dynamic sliding window based approaches address this problem by tracking changes in classifier error rate, but are supervised in nature. We propose an efficient semi-supervised framework in this paper which uses change detection on classifier confidence to detect concept drifts, and to determine chunk boundaries dynamically. It also addresses concept evolution problem by detecting outliers having strong cohesion among themselves. Experiment results on benchmark and synthetic data sets show effectiveness of the proposed approach.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Ludmila I. Kuncheva,et al.  PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Eduardo Jaques Spinosa,et al.  Novelty detection with application to data streams , 2009, Intell. Data Anal..

[4]  Mahmoud Reza Hashemi,et al.  A DCT based approach for detecting novelty and concept drift in data streams , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[5]  Latifur Khan,et al.  Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams , 2015, AAAI.

[6]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[7]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  David A. Cieslak,et al.  Detecting Fractures in Classifier Performance , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[12]  Xiaodong Lin,et al.  Active Learning from Data Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[14]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Data Streams with Active Mining , 2010, PAKDD.

[15]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[16]  Latifur Khan,et al.  Semi Supervised Adaptive Framework for Classifying Evolving Data Stream , 2015, PAKDD.

[17]  Cesare Alippi,et al.  Just-In-Time Classifiers for Recurrent Concepts , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[19]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[20]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[21]  Sanjay Ranka,et al.  Statistical change detection for multi-dimensional data , 2007, KDD '07.

[22]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Shie Mannor,et al.  Concept Drift Detection Through Resampling , 2014, ICML.

[24]  Michael Baron,et al.  Convergence rates of change‐point estimators and tail probabilities of the first‐passage‐time process , 1999 .

[25]  Takashi Omori,et al.  ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments , 2005, Multiple Classifier Systems.