Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble

The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, we propose an improved binary relevance method to take advantage of dependence information among class labels, and propose a dynamic classifier ensemble approach for classifying multi-label concept-drifting data streams. The weighted majority voting strategy is used in our classification algorithm. Our empirical study on both synthetic data set and real-life data set shows that the proposed dynamic classifier ensemble with improved binary relevance approach outperforms dynamic classifier ensemble with binary relevance algorithm, and static classifier ensemble with binary relevance algorithm.

[1]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[2]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[4]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[5]  Wang Yong,et al.  Mining Multi-label Concept-Drifting Streams Using Ensemble Classifiers , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[7]  Xiaoming Jin,et al.  An automatic construction and organization strategy for ensemble learning on data streams , 2006, SGMD.

[8]  Mykola Pechenizkiy,et al.  Handling Local Concept Drift with Dynamic Integration of Classifiers: Domain of Antibiotic Resistance in Nosocomial Infections , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[9]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[10]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[11]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[12]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[13]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[14]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[15]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[16]  Xindong Wu,et al.  Dynamic classifier selection for effective mining from noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[19]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[20]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[21]  Eyke Hüllermeier,et al.  Case-Based Multilabel Ranking , 2007, IJCAI.

[22]  Tat-Seng Chua,et al.  A maximal figure-of-merit learning approach to text categorization , 2003, SIGIR.

[23]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[24]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[25]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).