Semi-supervised Classification of Concept Drift Data Stream Based on Local Component Replacement

Being compared with traditional data mining, data stream has three distinct characteristics which pose new challenges to machine learning and data mining. These challenges will become more serious when only few instances are labeled in data stream. In the paper, based on the algorithm of SPASC, a strategy of local component replacement for updating classifier pool is proposed. The proposed strategy defines a vector based on local accuracy to evaluate the adaptability of each “component” of a cluster-based classifier to a new chunk and makes the trained cluster-based classifiers in the pool adapt to the current concept better and faster while retaining as much learned knowledge as possible. The proposed algorithm is compared with the state of the art baseline methods on multiple datasets, the experimental results illustrate the effectiveness of the proposed algorithm.

[1]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Xindong Wu,et al.  Mining Recurring Concept Drifts with Limited Labeled Streaming Data , 2010, TIST.

[4]  Zenglin Xu,et al.  Semi-supervised Learning from General Unlabeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[6]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data to Enhance Ensemble Diversity , 2010, ICDM.

[7]  John Grundy,et al.  MaramaAIC: tool support for consistency management and validation of requirements , 2017, Automated Software Engineering.

[8]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[9]  Shuyuan Yang,et al.  Incremental Semi-Supervised classification of data streams via self-representative selection , 2016, Appl. Soft Comput..

[10]  Li Guo,et al.  Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[11]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[12]  Hamid Beigy,et al.  An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams , 2016, Knowledge and Information Systems.

[13]  Xuegang Hu,et al.  Learning from concept drifting data streams with unlabeled data , 2012, Neurocomputing.

[14]  Hamid Beigy,et al.  Semi-supervised Ensemble Learning of Data Streams in the Presence of Concept Drift , 2012, HAIS.

[15]  Yuan Yan Tang,et al.  Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift , 2017, IJCAI.

[16]  Latifur Khan,et al.  Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels , 2009, ISMIS.

[17]  Lei Zhu,et al.  Incremental and Decremental Max-Flow for Online Semi-Supervised Learning , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Muhammad N. Marsono,et al.  Online data stream classification with incremental semi-supervised learning , 2015, CODS.

[19]  Stefan Kramer,et al.  Prototype-based learning on concept-drifting data streams , 2014, KDD.

[20]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[21]  Liang Zhao,et al.  Semi-supervised Learning with Concept Drift Using Particle Dynamics Applied to Network Intrusion Detection Data , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[22]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[23]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[24]  Geoff Holmes,et al.  Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[26]  Zhi-Hua Zhou,et al.  Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection , 2010, AAAI.