SACCOS: A Semi-Supervised Framework for Emerging Class Detection and Concept Drift Adaption Over Data Streams

In this paper, we address challenges of detecting instances from emerging classes over a non-stationary data stream during data classification. In particular, data instances from an entirely unknown class may appear in a data stream over time. Existing classification techniques utilize unsupervised clustering to identify emergence of such data instances. Unfortunately, they make strong assumptions which are typically invalid in practice; (i) Most instances associated with a class are closer to each other in feature space than instances associated with different classes, (ii) Covariates of data are normalized through an oracle to overcome the effect of a few data instances having large feature values, and (iii) Labels of instances from emerging classes are readily available soon after detection. To address the challenges that occur in practice when the above assumptions are weak, i.e., instances of each class are scattered and the true labels of novel class instances are sparsely available, we propose a practical semi-supervised emerging class detection framework. Particularly, we aim to identify similar data instances within local regions in feature space by incorporating a mutual graph clustering mechanism. We also perform online normalization along the data stream instead of assuming an oracle, and propose a classification technique that uses only a small amount of true labels for training and emerging class detection. Our empirical evaluation of this framework on real-world datasets demonstrates its superiority of classification performance compared to existing methods, while using significantly fewer labeled instances.