Unsupervised Concept Drift Detection with a Discriminative Classifier

In data stream mining, one of the biggest challenges is to develop algorithms that deal with the changing data. As data evolve over time, static models become outdated. This phenomenon is called concept drift, and it is investigated extensively in the literature. Detecting and subsequently adapting to concept drifts yield more robust and better performing models. In this study, we present an unsupervised method called D3 which uses a discriminative classifier with a sliding window to detect concept drift by monitoring changes in the feature space. It is a simple method that can be used along with any existing classifier that does not intrinsically have a drift adaptation mechanism. We experiment on the most prevalent concept drift detectors using 8 datasets. The results demonstrate that D3 outperforms the baselines, yielding models with higher performances on both real-world and synthetic datasets.

[1]  Mehmed M. Kantardzic,et al.  On the reliable detection of concept drift from streaming unlabeled data , 2017, Expert Syst. Appl..

[2]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[3]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[4]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[5]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[6]  Frédéric Magoulès,et al.  Detection of Concept Drift for Learning from Stream Data , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[7]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[8]  Geoff Holmes,et al.  Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[9]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[10]  Talel Abdessalem,et al.  Scikit-Multiflow: A Multi-output Streaming Framework , 2018, J. Mach. Learn. Res..

[11]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[12]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[13]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[14]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[15]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[16]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[17]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[18]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19]  Yong Shi,et al.  Categorizing and mining concept drifting data streams , 2008, KDD.

[20]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[21]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.