Mining Data Streams with Skewed Distribution based on Ensemble Method

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well skewed e.g., few positives but lots of negatives and skewed distributions, which are typical in many data stream applications. In this paper, we propose an ensemble and cluster based sample method to deal with this situation. The study shows that this method has effective result on skewed data streams mining.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Mehdi Khosrow-Pour,et al.  Ubiquitous and Pervasive Computing: Concepts, Methodologies, Tools, and Applications , 2009 .

[3]  Gabriele Anderst-Kotsis,et al.  The Ubiquitous Grid , 2007, MoMM.

[4]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[5]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[6]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[8]  Ola Henfridsson,et al.  Situated Knowledge in Context-Aware Computing: A Sequential Multimethod Study of In-Car Navigation , 2009, Int. J. Adv. Pervasive Ubiquitous Comput..

[9]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[12]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[13]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[14]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[15]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.