Tapped Delay Lines for GP Streaming Data Classification with Label Budgets

Streaming data classification requires that a model be available for classifying stream content while simultaneously detecting and reacting to changes to the underlying process generating the data. Given that only a fraction of the stream is ‘visible’ at any point in time (i.e. some form of window interface) then it is difficult to place any guarantee on a classifier encountering a ‘well mixed’ distribution of classes across the stream. Moreover, streaming data classifiers are also required to operate under a limited label budget (labelling all the data is too expensive). We take these requirements to motivate the use of an active learning strategy for decoupling genetic programming training epochs from stream throughput. The content of a data subset is controlled by a combination of Pareto archiving and stochastic sampling. In addition, a significant benefit is attributed to support for a tapped delay line (TDL) interface to the stream, but this also increases the dimensionality of the task. We demonstrate that the benefits of assuming the TDL can be maintained through the use of oversampling without recourse to additional label information. Benchmarking on 4 dataset demonstrates that the approach is particularly effective when reacting to shifts in the underlying properties of the stream. Moreover, an online formulation for class-wise detection rate is assumed, where this is able to robustly characterize classifier performance throughout the stream.

[1]  João Gama,et al.  A survey on learning from data streams: current and future trends , 2012, Progress in Artificial Intelligence.

[2]  Gianluigi Folino,et al.  Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP , 2010, EuroGP.

[3]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Malcolm I. Heywood,et al.  GP under streaming data constraints: a case for pareto archiving? , 2012, GECCO '12.

[5]  Malcolm I. Heywood Evolutionary model building under streaming data for classification tasks: opportunities and challenges , 2014, Genetic Programming and Evolvable Machines.

[6]  Derong Liu Editorial TNNLS Special Issues and Editorial Board Changes , 2014, IEEE Trans. Neural Networks Learn. Syst..

[7]  Geoff Holmes,et al.  Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them , 2013, ECML/PKDD.

[8]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[9]  Malcolm I. Heywood,et al.  Benchmarking pareto archiving heuristics in the presence of concept drift: diversity versus age , 2013, GECCO '13.

[10]  Robi Polikar,et al.  Guest Editorial Learning in Nonstationary and Evolving Environments , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Anthony Brabazon,et al.  Survey of EC in Dynamic Environments , 2009 .

[12]  Andrew R. McIntyre,et al.  On the application of GP to streaming data classification tasks with label budgets , 2014, GECCO.

[13]  Anthony Brabazon,et al.  Foundations in Grammatical Evolution for Dynamic Environments , 2009, Studies in Computational Intelligence.

[14]  Andrew R. McIntyre,et al.  Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces , 2012, Genetic Programming and Evolvable Machines.

[15]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[16]  Brian Mac Namee,et al.  Drift Detection Using Uncertainty Distribution Divergence , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  Bogdan Gabrys,et al.  Adaptive Preprocessing for Streaming Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[19]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[20]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Caroline Gagné,et al.  Learning from non-stationary data using a growing network of prototypes , 2013, 2013 IEEE Congress on Evolutionary Computation.

[22]  Tim French,et al.  Online learning classifiers in dynamic environments with incomplete feedback , 2013, 2013 IEEE Congress on Evolutionary Computation.