论文信息 - A Cluster-Based Prototype Reduction for Online Classification

A Cluster-Based Prototype Reduction for Online Classification

Data stream is a challenging research topic in which data can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, for example, a concept drift. A concept drift occurs when the concepts associated with a dataset change when new data arrive. This paper proposes a new method based on k-Nearest Neighbors that implements a sliding window requiring less instances stored for training than existing methods. For such, a clustering approach is used to summarize data by placing labeled instances considered similar in the same cluster. Besides, instances close to the uncertainty border of existing classes are also stored, in a sliding window, to adapt the model to concept drift. The proposed method is experimentally compared with state-of-the-art classifiers from the data stream literature, regarding accuracy and processing time. According to the experimental results, the proposed method has better accuracy and less time consumption when fewer information about the concepts are stored in a single sliding window.

André Carlos Ponce de Leon Ferreira de Carvalho | João Mendes-Moreira | Kemilly Dearo Garcia | A. Carvalho | João Mendes-Moreira

[1] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Novelty detection algorithm for data streams multi-class problems , 2013, SAC '13.

[2] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3] Philip S. Yu,et al. A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[4] Ricard Gavaldà,et al. Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[5] Geoff Holmes,et al. Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[6] Heiko Wersing,et al. KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7] João Gama,et al. A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[8] William Nick Street,et al. A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[9] Geoff Holmes,et al. MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[10] Geoff Holmes,et al. Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.