Error-space representations for multi-dimensional data streams with temporal dependence

AbstractIn many application scenarios, data points are not only temporally dependent, but also expected in the form of a fast-moving stream. A broad selection of efficient learning algorithms exists which may be applied to data streams, but they typically do not take into account the temporal nature of the data. We motivate and design a method which creates an efficient representation of a data stream, where temporal information is embedded into each instance via the error space of forecasting models. Unlike many other methods in the literature, our approach can be rapidly initialized and does not require iterations over the full data sequence, thus it is suitable for a streaming scenario. This allows the application of off-the-shelf data-stream methods, depending on the application domain. In this paper, we investigate classification. We compare to a large variety of methods (auto-encoders, HMMs, basis functions, clustering methodologies, and PCA) and find that our proposed methods perform very competitively, and offers much promise for future work.

[1]  Stefanos Zafeiriou,et al.  Probabilistic Slow Features for Behavior Analysis , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Steven Lemm,et al.  A Dynamic HMM for On-line Segmentation of Sequential Data , 2001, NIPS.

[3]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Edwin Lughofer,et al.  Fault detection in reciprocating compressor valves under varying load conditions , 2016 .

[6]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[7]  James L. Patton,et al.  Distributions in the error space: Goal-directed movements described in time and state-space representations , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[9]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[10]  Geoff Holmes,et al.  Improving Adaptive Bagging Methods for Evolving Data Streams , 2009, ACML.

[11]  Tilo Strutz,et al.  Data Fitting and Uncertainty , 2011 .

[12]  Juan Pardo,et al.  Stacked Denoising Auto-Encoders for Short-Term Time Series Forecasting , 2015 .

[13]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[14]  Albert Bifet,et al.  Deep learning in partially-labeled data streams , 2015, SAC.

[15]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[16]  Talel Abdessalem,et al.  Scikit-Multiflow: A Multi-output Streaming Framework , 2018, J. Mach. Learn. Res..

[17]  Volker Tresp,et al.  Call-Based Fraud Detection in Mobile Communication Networks Using a Hierarchical Regime-Switching Model , 1998, NIPS.

[18]  Edwin Lughofer,et al.  Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances , 2016, Inf. Sci..

[19]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[20]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[21]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[22]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..