The incremental Fourier classifier: Leveraging the discrete Fourier transform for classifying high speed data streams

Abstract Two major performance bottlenecks with decision tree based classifiers in a data stream environment are the depth of the tree and the update overhead of maintaining leaf node statistics on an instance-wise basis to ensure that classification is consistent with the current state of the data stream. Previous research has shown that classifiers based on Fourier spectra derived from decision trees produce compact array structures that can be searched and maintained much more efficiently than deep tree based structures. However, the key issue of incrementally adapting the spectrum to changes has not been addressed. In this research we present a strategy for incremental maintenance of the Fourier spectrum to changes in concept that take place in data stream environments. Along with the incremental approach we also propose schemes for feature selection and synopsis generation that enable the coefficient array to be refreshed efficiently on a periodic basis. Our empirical evaluation on a number of widely used stream classifiers reveals that the Fourier classifier outperforms them, both in terms of classification accuracy as well as speed of classification.

[1]  Russel Pears,et al.  Mining Recurrent Concepts in Data Streams Using the Discrete Fourier Transform , 2014, DaWaK.

[2]  Luis M. Candanedo,et al.  Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models , 2016 .

[3]  Jerzy Stefanowski,et al.  Accuracy Updated Ensemble for Data Streams with Concept Drift , 2011, HAIS.

[4]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[5]  Hillol Kargupta,et al.  A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Haimonti Dutta,et al.  Orthogonal decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Geoff Holmes,et al.  Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking , 2010, ACML.

[8]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[9]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[10]  Hillol Kargupta,et al.  Knowledge discovery from heterogeneous data streams using fourier spectrum of decision trees , 2001 .

[11]  S. Aquter Babu,et al.  ACCURATE DECISION TREE , 2013 .

[12]  Antoine Cornuéjols,et al.  Online Learning: Searching for the Best Forgetting Strategy under Concept Drift , 2013, ICONIP.

[13]  Jesús S. Aguilar-Ruiz,et al.  A similarity-based approach for data stream classification , 2014, Expert Syst. Appl..

[14]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[15]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[16]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[17]  Michael P. Knapp Sines and Cosines of Angles in Arithmetic Progression , 2009 .

[18]  Russel Pears,et al.  Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[20]  Antoine Cornuéjols,et al.  A New On-Line Learning Method for Coping with Recurring Concepts: The ADACC System , 2013, ICONIP.

[21]  Haimonti Dutta,et al.  Orthogonal decision trees , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[22]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[23]  Russel Pears,et al.  Staged Online Learning: A new approach to classification in high speed data streams , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[26]  Michel Ferreira,et al.  Time-evolving O-D matrix estimation using high-speed GPS data streams , 2016, Expert Syst. Appl..