Mining Data Streams with Skewed Distribution by Static Classifier Ensemble

In many data stream applications, the category distribution is imbalanced. However, current research community on data stream mining focus on mining balanced data streams, without enough attention being paid to the study of mining skewed data streams. In this paper, we proposed an clustering-sampling based ensemble algorithm with weighted majority voting for learning skewed data streams. We made experiments on synthetic data set simulating skewed data streams. The experiment results show that clustering-sampling outperforms under-sampling, and that compared with single window, the proposed ensemble based algorithm has better classification performance.