Resampling Techniques for Learning Under Extreme Verification Latency with Class Imbalance

A common, yet rarely addressed, real-world problem in computational intelligence applications is learning from non-stationary streaming data, where the underlying distribution of the data changes over time. This problem, also referred to as concept drift, is made even more challenging if, after initially receiving a small set of labeled data, the streaming data only consists of unlabeled data, requiring the learner to adapt to changing underlying distribution without the benefit of labeled data. This particular scenario is typically referred to as learning in initially labeled nonstationary environment, or as extreme verification latency (EVL), pointing to the fact that the label verification of the test data is indefinitely delayed. In our prior work, we have noted that current EVL algorithms - including the algorithm COMPOSE that we have developed - are largely unable to track changing distributions if the data drawn from those distributions are even mildly imbalanced. In this work, we integrate COMPOSE with 13 different resampling based modified algorithms, and compare accuracy, F1 score, and execution time. The results differed from what we originally expected and provided unique insight on how to choose a data rebalancing approach for different types of drift.

[1]  Roy A. Maxion,et al.  Why Did My Detector Do That?! - Predicting Keystroke-Dynamics Error Rates , 2010, RAID.

[2]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[3]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[4]  Robi Polikar,et al.  Core support extraction for learning from initially labeled nonstationary environments using COMPOSE , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[5]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[6]  João Gama,et al.  Classification of Evolving Data Streams with Infinitely Delayed Labels , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[7]  Robi Polikar,et al.  Quantifying the limited and gradual concept drift assumption , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[8]  João Gama,et al.  Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency , 2015, SDM.

[9]  Robi Polikar,et al.  LEVELIW: Learning extreme verification latency with importance weighting , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[11]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[12]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[13]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[16]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[17]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[18]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[20]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[21]  Ana L. C. Bazzan,et al.  Balancing Training Data for Automated Annotation of Keywords: a Case Study , 2003, WOB.

[22]  Georg Krempl,et al.  The Algorithm APT to Classify in Concurrence of Latency and Drift , 2011, IDA.

[23]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[24]  Masashi Sugiyama,et al.  Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition , 2012, Neurocomputing.

[25]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[26]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  Tom Mitchell,et al.  Learning from Labeled and Unlabeled Data , 2017, Encyclopedia of Machine Learning and Data Mining.

[28]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[29]  Robi Polikar,et al.  Adding adaptive intelligence to sensor systems with MASS , 2017, 2017 IEEE Sensors Applications Symposium (SAS).

[30]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.