Integrating Novel Class Detection with Classification for Concept-Drifting Data Streams

In a typical data stream classification task, it is assumed that the total number of classes are fixed. This assumption may not be valid in a real streaming environment, where new classes may evolve. Traditional data stream classification techniques are not capable of recognizing novel class instances until the appearance of the novel class is manually identified, and labeled instances of that class are presented to the learning algorithm for training. The problem becomes more challenging in the presence of concept-drift, when the underlying data distribution changes over time. We propose a novel and efficient technique that can automatically detect the emergence of a novel class in the presence of concept-drift by quantifying cohesion among unlabeled test instances, and separation of the test instances from training instances. Our approach is non-parametric, meaning, it does not assume any underlying distributions of data. Comparison with the state-of-the-art stream classification techniques prove the superiority of our approach.

[1]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[2]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[3]  Philip K. Chan,et al.  Weighting versus pruning in rule validation for detecting network and host anomalies , 2007, KDD '07.

[4]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[5]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[6]  Xindong Wu,et al.  Combining proactive and reactive predictions for data streams , 2005, KDD '05.

[7]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[8]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[9]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[10]  Philip S. Yu,et al.  Stop Chasing Trends: Discovering High Order Models in Evolving Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[12]  Stephen J. Roberts,et al.  Extreme value statistics for novelty detection in biomedical signal processing , 2000 .

[13]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[14]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[15]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.