Mining Multi-Label Data Streams Using Ensemble-Based Active Learning

Data stream classification has drawn increasing attention from the data mining community in recent years, where a large number of stream classification models were proposed. However, most existing models were merely focused on mining from single-label data streams. Mining from multi-label data streams has not been fully addressed yet. On the other hand, although some recent work touched the multi-label stream mining problem, they never consider the expensive labeling cost issue, preventing them from real-world applications. To this end, we study, in this paper, a challenging problem that mining from multi-label data streams with limited labeling resource. Specifically, we propose an ensemblebased active learning framework to handle the large volume of stream data, expensive labeling cost and concept drifting problems on multi-label data streams. Experiments on both synthetic and real world data sets demonstrate the performance of the proposed method.

[1]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[2]  Grigorios Tsoumakas,et al.  Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification , 2011, IJCAI.

[3]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Li Guo,et al.  Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[6]  Grigorios Tsoumakas,et al.  Multi-Label Classification , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[7]  Yong Wang,et al.  Mining Multi-label Concept-Drifting Streams Using Ensemble Classifiers , 2009, FSKD.

[8]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[9]  Yang Zhang,et al.  Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble , 2009, ACML.

[10]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  ZhangPeng,et al.  Robust ensemble learning for mining noisy data streams , 2011 .

[12]  Li Guo,et al.  Enabling Fast Lazy Learning for Data Streams , 2011, 2011 IEEE 11th International Conference on Data Mining.

[13]  Li Guo,et al.  Enabling fast prediction for ensemble models on data streams , 2011, KDD.

[14]  Xindong Wu,et al.  Robust ensemble learning for mining noisy data streams , 2011, Decis. Support Syst..

[15]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[16]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[17]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[18]  Yongdai Kim,et al.  Model Averaging via Penalized Regression for Tracking Concept Drift , 2010 .

[19]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[20]  Kim,et al.  A Gradient-Based Optimization Algorithm for LASSO , 2008 .

[21]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[22]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[23]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[24]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[25]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[27]  Wang Yong,et al.  Mining Multi-label Concept-Drifting Streams Using Ensemble Classifiers , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[28]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[30]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.