Mining Multi-label Concept-Drifting Streams Using Ensemble Classifiers

The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, a weighted voting ensemble approach is proposed to tackle this problem. We partition the incoming data stream into sequential chunks, and use binary relevance method to transform each chunk into a set of single-label chunks, which could be learned by binary classification algorithm. We train an ensemble of classifiers from the transformed chunks, and the classifiers in the ensemble are weighted based on their expected classification accuracy on the test data under the time-evolving environment. We also proposed a method for simulating multi-label data stream with concept drifting. Our empirical study on synthetic data set shows that the proposed approach has substantial advantage over majority voting ensemble approach.

[1]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[2]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[3]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[4]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[5]  Eyke Hüllermeier,et al.  A Unified Model for Multilabel Classification and Ranking , 2006, ECAI.

[6]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[7]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[8]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[10]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[11]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[12]  Xindong Wu,et al.  Dynamic classifier selection for effective mining from noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[14]  Tat-Seng Chua,et al.  A maximal figure-of-merit learning approach to text categorization , 2003, SIGIR.

[15]  Xiaoming Jin,et al.  An automatic construction and organization strategy for ensemble learning on data streams , 2006, SGMD.

[16]  Mykola Pechenizkiy,et al.  Handling Local Concept Drift with Dynamic Integration of Classifiers: Domain of Antibiotic Resistance in Nosocomial Infections , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[17]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[18]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[19]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[20]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.