Deep learning on symbolic representations for large-scale heterogeneous time-series event prediction

In this paper, we consider the problem of event prediction with multi-variate time series data consisting of heterogeneous (continuous and categorical) variables. The complex dependencies between the variables combined with asynchronicity and sparsity of the data makes the event prediction problem particularly challenging. Most state-of-art approaches address this either by designing hand-engineered features or breaking up the problem over homogeneous variates. In this work, we formulate the (rare) event prediction task as a classification problem with a novel asymmetric loss function and propose an end-to-end deep learning algorithm over symbolic representations of time-series. Symbolic representations are fed into an embedding layer and a Long Short Term Memory Neural Network (LSTM) layer which are trained to learn discriminative features. We also propose a simple sequence chopping technique to speed-up the training of LSTM for long temporal sequences. Experiments on real-world industrial datasets demonstrate the effectiveness of the proposed approach.

[1]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[2]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Yu Hen Hu,et al.  Detection, classification, and tracking of targets , 2002, IEEE Signal Process. Mag..

[6]  Jürgen Altmann,et al.  Acoustic and seismic signals of heavy military vehicles for co-operative verification , 2004 .

[7]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[8]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[9]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Hans-Georg Zimmermann,et al.  Recurrent Neural Networks are Universal approximators , 2007, Int. J. Neural Syst..

[12]  Yann LeCun,et al.  Classification of patterns of EEG synchronization for seizure prediction , 2009, Clinical Neurophysiology.

[13]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Asok Ray,et al.  Performance comparison of feature extraction algorithms for target detection and classification , 2013, Pattern Recognit. Lett..

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Asha,et al.  A Hand Gesture Recognition Framework and Wearable Gesture Based Interaction Prototype for Mobile Devices , 2015 .

[22]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[23]  Asok Ray,et al.  Multimodal Task-Driven Dictionary Learning for Image Classification , 2015, IEEE Transactions on Image Processing.