On the reliable detection of concept drift from streaming unlabeled data

New classifier-independent, dynamic, unsupervised approach for detecting concept drift.Reduced number of false alarms and increased relevance of drift detection.Results comparable to supervised approaches, which require fully labeled streams.Our approach generalizes the notion of margin density, as a signal to detect drifts.Experiments on cybersecurity datasets, show efficacy for detecting adversarial drifts. Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. At the same time, it produces performance comparable to that of a fully labeled drift detector. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.

[1]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[2]  Renaud Lambiotte,et al.  Predicting links in ego-networks using temporal information , 2015, EPJ Data Science.

[3]  Brian Mac Namee,et al.  Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost , 2010, FLAIRS.

[4]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Frédéric Magoulès,et al.  Detection of Concept Drift for Learning from Stream Data , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[6]  Hamid Beigy,et al.  New Drift Detection Method for Data Streams , 2011, ICAIS.

[7]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[8]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[10]  Tim Oates,et al.  We’re Not in Kansas Anymore: Detecting Domain Changes in Streams , 2010, EMNLP.

[11]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[12]  Gregory Ditzler,et al.  Hellinger distance based drift detection for nonstationary environments , 2011, 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE).

[13]  James J. Chen,et al.  Classification by ensembles from random partitions of high-dimensional data , 2007, Comput. Stat. Data Anal..

[14]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[15]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[17]  Fei Wang Robust and Adversarial Data Mining , 2015 .

[18]  David Haussler,et al.  Probably Approximately Correct Learning , 2010, Encyclopedia of Machine Learning.

[19]  Ling Huang,et al.  Approaches to adversarial drift , 2013, AISec.

[20]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[21]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[22]  Li Zhang,et al.  An adaptive ensemble classifier for mining concept drifting data streams , 2013, Expert Syst. Appl..

[23]  Angelos Stavrou,et al.  When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Novelty detection algorithm for data streams multi-class problems , 2013, SAC '13.

[25]  Grigorios Tsoumakas,et al.  Tracking recurring contexts using ensemble classifiers: an application to email filtering , 2009, Knowledge and Information Systems.

[26]  Indre Zliobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[27]  Shie Mannor,et al.  Concept Drift Detection Through Resampling , 2014, ICML.

[28]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[29]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30]  T. Wieczorek,et al.  Comparison of feature ranking methods based on information entropy , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  Mehmed M. Kantardzic,et al.  A grid density based framework for classifying streaming data in the presence of concept drift , 2015, Journal of Intelligent Information Systems.

[32]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[33]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[34]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[35]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[36]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Detecting Concept Change in Streaming Data: Overview and Perspectives , 2008 .

[37]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[38]  Mehmed M. Kantardzic,et al.  Monitoring Classification Blindspots to Detect Drifts from Unlabeled Data , 2016, 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI).

[39]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[40]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[41]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Anton Dries,et al.  Adaptive concept drift detection , 2009, SDM.

[45]  Rodrigo Fernandes de Mello,et al.  Using dynamical systems tools to detect concept drift in data streams , 2016, Expert Syst. Appl..

[46]  Ludmila I. Kuncheva,et al.  PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Mehmed M. Kantardzic,et al.  Don't Pay for Validation: Detecting Drifts from Unlabeled data Using Margin Density , 2015, INNS Conference on Big Data.

[48]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[49]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[50]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[51]  Marcus A. Maloof,et al.  Paired Learners for Concept Drift , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[52]  Tim Oates,et al.  Ensembles in adversarial classification for spam , 2009, CIKM.

[53]  Roberto Souto Maior de Barros,et al.  A comparative study on concept drift detectors , 2014, Expert Syst. Appl..

[54]  Indre liobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, ICDM 2010.

[55]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[56]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Joung Woo Ryu,et al.  An Efficient Method of Building an Ensemble of Classifiers in Streaming Data , 2012, BDA.

[58]  Heng Wang,et al.  Concept drift detection for streaming data , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[59]  Edwin Lughofer,et al.  Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances , 2016, Inf. Sci..

[60]  Grigorios Tsoumakas,et al.  An adaptive personalized news dissemination system , 2009, Journal of Intelligent Information Systems.

[61]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[62]  Brian Mac Namee,et al.  Drift detection using uncertainty distribution divergence , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[63]  Xiangliang Zhang,et al.  A PCA-Based Change Detection Framework for Multidimensional Data Streams: Change Detection in Multidimensional Data Streams , 2015, KDD.

[64]  Mahmoud Reza Hashemi,et al.  A DCT based approach for detecting novelty and concept drift in data streams , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[65]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[66]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.