Data Stream Classification Based on the Gamma Classifier

The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier) implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

[1]  Geoff Holmes,et al.  Scalable and efficient multi-label classification for evolving data streams , 2012, Machine Learning.

[2]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[3]  Hadi Sadoghi Yazdi,et al.  Recursive least square perceptron model for non-stationary and imbalanced data stream classification , 2013, Evol. Syst..

[4]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[5]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[6]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[7]  Albert Bifet,et al.  Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , 2010, Frontiers in Artificial Intelligence and Applications.

[8]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[9]  Eyke Hüllermeier,et al.  Efficient instance-based learning on data streams , 2007, Intell. Data Anal..

[10]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[11]  Jesús S. Aguilar-Ruiz,et al.  A similarity-based approach for data stream classification , 2014, Expert Syst. Appl..

[12]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[13]  Oscar Camacho Nieto,et al.  Pollutants Time-Series Prediction using the Gamma Classifier , 2011, Int. J. Comput. Intell. Syst..

[14]  Magnus Löfstrand,et al.  Data stream forecasting for system fault prediction , 2012, Comput. Ind. Eng..

[15]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[16]  Bartosz Krawczyk,et al.  Incremental weighted one-class classifier for mining stationary data streams , 2015, J. Comput. Sci..

[17]  Jerzy Stefanowski,et al.  Combining block-based and online methods in learning ensembles from concept drifting data streams , 2014, Inf. Sci..

[18]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[19]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[20]  Nicolas Saunier,et al.  Creating ensemble classifiers through order and incremental data selection in a stream , 2013, Pattern Analysis and Applications.

[21]  Xue Li,et al.  Learning from data streams with only positive and unlabeled data , 2013, Journal of Intelligent Information Systems.

[22]  Cornelio Yáñez-Márquez,et al.  A novel associative model for time series data mining , 2014, Pattern Recognit. Lett..

[23]  Nada Lavrac,et al.  Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform , 2015, Inf. Process. Manag..

[24]  Elisabet Golobardes,et al.  Robust on-line neural learning classifier system for data stream classification tasks , 2014, Soft Comput..

[25]  Li Zhang,et al.  An adaptive ensemble classifier for mining concept drifting data streams , 2013, Expert Syst. Appl..

[26]  João Gama,et al.  A survey on learning from data streams: current and future trends , 2012, Progress in Artificial Intelligence.

[27]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[28]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[29]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[30]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[31]  Edgardo Manuel Felipe Riverón,et al.  A Novel Approach to Automatic Color Matching , 2006, CIARP.

[32]  Ludmila I. Kuncheva,et al.  Change Detection in Streaming Multivariate Data Using Likelihood Detectors , 2013, IEEE Transactions on Knowledge and Data Engineering.

[33]  Xuegang Hu,et al.  Learning from concept drifting data streams with unlabeled data , 2012, Neurocomputing.

[34]  Peng Shi,et al.  Learning very fast decision tree from uncertain data streams with positive and unlabeled samples , 2012, Inf. Sci..

[35]  Hamid Beigy,et al.  Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification , 2013, Evol. Syst..

[36]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[37]  Eyke Hüllermeier,et al.  IBLStreams: a system for instance-based classification and regression on data streams , 2012, Evol. Syst..

[38]  Latifur Khan,et al.  Facing the reality of data stream classification: coping with scarcity of labeled data , 2012, Knowledge and Information Systems.

[39]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[40]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[41]  Cornelio Yáñez-Márquez,et al.  Alpha–Beta bidirectional associative memories: theory and applications , 2007, Neural Processing Letters.

[42]  Hadi Sadoghi Yazdi,et al.  Ensemble of online neural networks for non-stationary and imbalanced data streams , 2013, Neurocomputing.

[43]  Dimitris K. Tasoulis,et al.  Exponentially weighted moving average charts for detecting concept drift , 2012, Pattern Recognit. Lett..

[44]  Albert Bifet,et al.  Massive Online Analysis , 2009 .

[45]  Jesús S. Aguilar-Ruiz,et al.  Data streams classification by incremental rule learning with parameterized generalization , 2006, SAC '06.