Contextual anomaly detection on time series: a case study of metro ridership analysis

The increase in the amount of data collected in the transport domain can greatly benefit mobility studies and create high value-added mobility information for passengers, data analysts, and transport operators. This work concerns the detection of the impact of disturbances on a transport network. It aims, from smart card data analysis, to finely quantify the impacts of known disturbances on the transportation network usage and to reveal unexplained statistical anomalies that may be related to unknown disturbances. The mobility data studied take the form of a multivariate time series evolving in a dynamic environment with additional contextual attributes. The research mainly focuses on contextual anomaly detection using machine learning models. Our main goal is to build a robust anomaly score to highlight statistical anomalies (contextual extremums), considering the variability within the time series induced by the dynamic context. The robust anomaly score is built from normalized forecasting residuals. The normalization of the residuals is carried out using the estimated contextual variance. Indeed, there are complex dynamics on both the mean and the variance in the ridership time series induced by the flexible transportation schedule, the variability in transport demand, and contextual factors such as the station location and the calendar information. Therefore, they should be considered by the anomaly detection approach to obtain a reliable anomaly score. We investigate several prediction models (including an LSTM encoder–decoder of the recurrent neural network deep learning family) and several variance estimators obtained through dedicated models or extracted from prediction models. The proposed approaches are evaluated on synthetic data and real data from the smart card riderships of the Quebec Metro network. It includes a basis of events and disturbances that have impacted the transport network. The experiments show the relevance of variance normalization on prediction residuals to build a robust anomaly score under a dynamic context.

[1]  Yu-Ru Lin,et al.  Voila: Visual Anomaly Detection and Monitoring with Streaming Spatiotemporal Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[2]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[3]  Miriam A. M. Capretz,et al.  Contextual Anomaly Detection in Big Sensor Data , 2014, 2014 IEEE International Congress on Big Data.

[4]  Valentino Constantinou,et al.  Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.

[5]  Eamonn J. Keogh,et al.  MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[6]  Andreas Dengel,et al.  DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series , 2019, IEEE Access.

[7]  Latifa Oukhellou,et al.  Short-Term Multi-Step Ahead Forecasting of Railway Passenger Flows During Special Events With Machine Learning Methods , 2018 .

[8]  Md. Al Mehedi Hasan,et al.  Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS) , 2014 .

[9]  Anazida Zainal,et al.  Fraud detection system: A survey , 2016, J. Netw. Comput. Appl..

[10]  Latifa Oukhellou,et al.  LSTM Encoder-Predictor for Short-Term Train Load Forecasting , 2019, ECML/PKDD.

[11]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[12]  Wolfgang Kellerer,et al.  Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking , 2016 .

[13]  Léna Carel Big data analysis in the field of transportation , 2019 .

[14]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[15]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[16]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[19]  Len Feremans,et al.  Pattern-Based Anomaly Detection in Mixed-Type Time Series , 2019, ECML/PKDD.

[20]  Varun Chandola,et al.  Anomaly detection for symbolic sequences and time series data , 2009 .

[21]  Patrick Gallinari,et al.  Anomaly detection in smart card logs and distant evaluation with Twitter: a robust framework , 2018, Neurocomputing.

[22]  Borko Furht,et al.  Anomaly Detection in Medical Wireless Sensor Networks using SVM and Linear Regression Models , 2014, Int. J. E Health Medical Commun..

[23]  Yifan Guo,et al.  Multidimensional Time Series Anomaly Detection: A GRU-based Gaussian Mixture Variational Autoencoder Approach , 2018, ACML.

[24]  Lovekesh Vig,et al.  LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection , 2016, ArXiv.

[25]  Yang Yu,et al.  Network Intrusion Detection through Stacking Dilated Convolutional Autoencoders , 2017, Secur. Commun. Networks.

[26]  Georg Langs,et al.  Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[27]  Richard J. Povinelli,et al.  Time series outlier detection and imputation , 2014, 2014 IEEE PES General Meeting | Conference & Exposition.

[28]  Nguyen Lu Dang Khoa,et al.  Robust Deep Learning Methods for Anomaly Detection , 2020, KDD.

[29]  Ejaz Ahmed,et al.  Real-time big data processing for anomaly detection: A Survey , 2019, Int. J. Inf. Manag..

[30]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[31]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[32]  Minrui Fei,et al.  An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data Using Sliding Window , 2013, ICONS.

[33]  Pere Barlet-Ros,et al.  Detecting network performance anomalies with contextual anomaly detection , 2017, 2017 IEEE International Workshop on Measurement and Networking (M&N).

[34]  Witold Pedrycz,et al.  Multivariate time series anomaly detection: A framework of Hidden Markov Models , 2017, Appl. Soft Comput..

[35]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[36]  Khalid Benabdeslem,et al.  Unsupervised outlier detection for time series by entropy and dynamic time warping , 2018, Knowledge and Information Systems.

[37]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[38]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[39]  Prakash Kripakaran,et al.  Support vector regression for anomaly detection from measurement histories , 2013, Adv. Eng. Informatics.