Co-training Based on Semi-Supervised Ensemble Classification Approach for Multi-label Data Stream

A large amount of data streams in the form of texts and images has been emerging in many real-world applications. These data streams often present the characteristics such as multi-labels, label missing and new class emerging, which makes the existing data stream classification algorithm face the challenges in precision space and time performance. This is because, on the one hand, it is known that data stream classification algorithms are mostly trained on all labeled single-class data, while there are a large amount of unlabeled data and few labeled data due to it is difficult to obtain labels in the real world. On the other hand, many of existing multi-label data stream classification algorithms mostly focused on the classification with all labeled data and without emerging new classes, and there are few semi-supervised methods. Therefore, this paper proposes a semi-supervised ensemble classification algorithm for multi-label data streams based on co-training. Firstly, the algorithm uses the sliding window mechanism to partition the data stream into data chunks. On the former w data chucks, the multi-label semi-supervised classification algorithm COINS based on co-training is used to training a base classifier on each chunk, and then an ensemble model with w COINS classifiers is generated ensemble model to adapt to the environment of data stream with a large number of unlabeled data. Meanwhile, a new class emerging detection mechanism is introduced, and the w+1 data chunk is predicted by the ensemble model to detect whether there is a new class emerging. When a new label is detected, the classifier is retrained on the current data chunk, and the ensemble model is updated. Finally, experimental results on five real data sets show that: as compared with the classical algorithms, the proposed approach can improve the classification accuracy of multi-label data streams with a large number of missing labels and new labels emerging.

[1]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[2]  Feiping Nie,et al.  Nuclear-norm based semi-supervised multiple labels learning , 2018, Neurocomputing.

[3]  Zhi-Hua Zhou,et al.  Streaming Classification with Emerging New Class by Class Matrix Sketching , 2017, AAAI.

[4]  Zhi-Hua Zhou,et al.  Social Stream Classification with Emerging New Labels , 2018, PAKDD.

[5]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Emerging New Labels , 2018, IEEE Transactions on Knowledge and Data Engineering.

[6]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Le Wu,et al.  Multi-Label Classification with Unlabeled Data: An Inductive Approach , 2013, ACML.

[8]  Jesse Read,et al.  Multi-label Classification , 2014 .

[9]  Grigorios Tsoumakas,et al.  Dealing with Concept Drift and Class Imbalance in Multi-Label Stream Classification , 2011, IJCAI.

[10]  EVA GIBAJA,et al.  A Tutorial on Multi-Label Learning , 2014 .

[11]  Ying Jiang,et al.  Multi-label K-Nearest Neighbor Classification Method Based on Semi-supervised , 2018 .

[12]  Alan Wee-Chung Liew,et al.  Multi-label classification via label correlation and first order feature dependance in a data stream , 2019, Pattern Recognit..

[13]  Geoff Holmes,et al.  Scalable and efficient multi-label classification for evolving data streams , 2012, Machine Learning.

[14]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Emerging New Labels , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Yun Fu,et al.  Robust multi-label semi-supervised classification , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[16]  Guoyong Cai,et al.  Efficient class incremental learning for multi-label classification of evolving data streams , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[17]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[18]  Guoxian Yu,et al.  Semi-supervised multi-label classification using incomplete label information , 2017, Neurocomputing.

[19]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[20]  Philip S. Yu,et al.  An ensemble-based approach to fast classification of multi-label data streams , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[21]  Tao Li,et al.  Improving semi-supervised co-forest algorithm in evolving data streams , 2018, Applied Intelligence.

[22]  Giovanna Castellano,et al.  Incremental adaptive semi-supervised fuzzy clustering for data stream classification , 2018, 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).