UNSUPERVISED ANOMALY DETECTION IN SEQUENCES USING LONG SHORT TERM MEMORY RECURRENT NEURAL

UNSUPERVISED ANOMALY DETECTION IN SEQUENCES USING LONG SHORT TERM MEMORY RECURRENT NEURAL NETWORKS Majid S. alDosari George Mason University, 2016 Thesis Director: Dr. Kirk D. Borne Long Short Term Memory (LSTM) recurrent neural networks (RNNs) are evaluated for their potential to generically detect anomalies in sequences. First, anomaly detection techniques are surveyed at a high level so that their shortcomings are exposed. The shortcomings are mainly their inflexibility in the use of a context ‘window’ size and/or their suboptimal performance in handling sequences. Furthermore, high-performing techniques for sequences are usually associated with their respective knowledge domains. After discussing these shortcomings, RNNs are exposed mathematically as generic sequence modelers that can handle sequences of arbitrary length. From there, results from experiments using RNNs show their ability to detect anomalies in a set of test sequences. The test sequences had different types of anomalies and unique normal behavior. Given the characteristics of the test data, it was concluded that the RNNs were not only able to generically distinguish rare values in the data (out of context) but were also able to generically distinguish abnormal patterns (in context). In addition to the anomaly detection work, a solution for reproducing computational research is described. The solution addresses reproducing compute applications based on Docker container technology as well as automating the infrastructure that runs the applications. By design, the solution allows the researcher to seamlessly transition from local (test) application execution to remote (production) execution because little distinction is made between local and remote execution. Such flexibility and automation allows the researcher to be more confident of results and more productive, especially when dealing with multiple machines. Chapter 1: Introduction In the modern world, large amounts of time series data of various types are recorded. Inexpensive and compact instrumentation and storage allows various types of processes to be recorded. For example, human activity being recorded includes physiological signals, automotive traffic, website navigation activity, and communication network traffic. Other kinds of data are captured from instrumentation in industrial processes, automobiles, space probes, telescopes, geological formations, oceans, power lines, and residential thermostats. Furthermore, the data can be machine generated for diagnostic purposes such as web server logs, system startup logs, and satellite status logs. Increasingly, these data are being analyzed. Inexpensive and ubiquitous networking has allowed the data to be transmitted for processing. At the same time, ubiquitous computing has allowed the data to be processed at the location of capture. While the data can be recorded for historical purposes, much value can be obtained from finding anomalous data. However, it is challenging to manually analyze large and varied quantities of data to find anomalies. Even if a procedure can be developed for one type of data, it usually cannot be applied to another type of data. Hence, the problem that is addressed can be stated as follows: find anomalous points in an arbitrary (unlabeled) sequence. So, a solution must use the same procedure to analyze different types of time series data. The solution presented here comes from an unsupervised use of recurrent neural networks. A literature search only readily gives two similar solutions. In the acoustics domain, [1] ¬ In this document, the terms ‘time series’ and ‘sequence’ are used interchangeably without implication to the discussion. Strictly however, a time series is a sequence of time-indexed elements. So a sequence is the more general object. As such, the term ‘sequence’ is used when a general context is more applicable. Furthermore, the terms do not imply that the data are real, discrete, or symbolic. However, literature frequently uses the terms ‘time series’ and ‘sequence’ for real and symbolic data respectively. Here, the term ‘time series’ was used to emphasize that much data is recorded from monitoring devices which implies that a timestamp is associated with each data point. 1 transform audio signals into a sequence of spectral features which are then input to a denoising recurrent autoencoder. Improving on this, [2] use recurrent neural networks (directly) without the use of features (that are specific to a problem domain, like acoustics) to multiple domains. This work closely resembles [2] but presenting a single, highly-automated procedure that applies to many domains is emphasized. First, some background is given on anomaly detection that explains the challenges of finding a solution. Second, recurrent neural networks are introduced as general sequence modelers. Then, experiments will be presented to show that recurrent neural networks can find different types of anomalies in multiple domains. Finally, concluding remarks are given. Outlier, surprise, novelty, and deviation detection are alternative names used in literature. 2 Chapter 2: The Challenge of Anomaly Detection in Sequences

[1]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[4]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[5]  B Ng Survey of Anomaly Detection Methods , 2006 .

[6]  W. Drosdowsky,et al.  An analysis of Australian seasonal rainfall anomalies: 1950–1987. I: Spatial patterns , 1993 .

[7]  Jonathan M. Borwein,et al.  SIAM: “Setting the Default to Reproducible” in Computational Science Research , 2013 .

[8]  Marvin Minsky,et al.  Computation : finite and infinite machines , 2016 .

[9]  Marimuthu Palaniswami,et al.  Privacy-Preserving Collaborative Anomaly Detection for Participatory Sensing , 2014, PAKDD.

[10]  Terran Lane,et al.  An Application of Machine Learning to Anomaly Detection , 1999 .

[11]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[12]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[13]  Sanjay Chawla,et al.  Spatio-temporal Outlier Detection in Precipitation Data , 2008, KDD Workshop on Knowledge Discovery from Sensor Data.

[14]  Jian Pei,et al.  WAT: Finding Top-K Discords in Time Series Database , 2007, SDM.

[15]  Philip K. Chan,et al.  Trajectory boundary modeling of time series for anomaly detection , 2005 .

[16]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[17]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[18]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[19]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[20]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[21]  Lei Xie,et al.  Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[23]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Boleslaw K. Szymanski,et al.  Recursive data mining for masquerade detection and author identification , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[25]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[26]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.

[27]  Erik Marchi,et al.  A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Yiguo Qiao,et al.  Anomaly intrusion detection method based on HMM , 2002 .

[29]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[30]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[31]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[32]  Saeed Aghabozorgi,et al.  A Review of Subsequence Time Series Clustering , 2014, TheScientificWorldJournal.

[33]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[34]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[37]  Raymond T. Ng,et al.  A unified approach for mining outliers , 1997, CASCON.

[38]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[39]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[40]  Eamonn J. Keogh,et al.  Finding Time Series Discords Based on Haar Transform , 2006, ADMA.

[41]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[42]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[44]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[45]  Rayford B. Vaughn,et al.  Efficient Modeling of Discrete Events for Anomaly Detection Using Hidden Markov Models , 2005, ISC.

[46]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[47]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[50]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[51]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[52]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[53]  Subhashini Venugopalan,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[54]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[55]  Benjamin Schrauwen,et al.  Training and analyzing deep recurrent neural networks , 2013, NIPS 2013.

[56]  Paul Sava,et al.  Madagascar: open-source software project for multidimensional data analysis and reproducible computational experiments , 2013 .

[57]  Majid Sarrafzadeh,et al.  Dimensionality Reduction for Anomaly Detection in Electrocardiography: A Manifold Approach , 2012, 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks.

[58]  Yizhou Sun,et al.  Community Trend Outlier Detection Using Soft Temporal Pattern Mining , 2012, ECML/PKDD.

[59]  Ying Zhang,et al.  Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[61]  R. Lasaponara On the use of principal component analysis (PCA) for evaluating interannual vegetation anomalies from SPOT/VEGETATION NDVI temporal series , 2006 .

[62]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[63]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[64]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[65]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[66]  Christopher Kermorvant,et al.  The A2iA Arabic Handwritten Text Recognition System at the Open HaRT2013 Evaluation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[67]  Björn W. Schuller,et al.  Social signal classification using deep blstm recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[69]  Ma Xiujun,et al.  Detecting spatio-temporal outliers in climate dataset: a method study , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[70]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[71]  K. Doya,et al.  Bifurcations in the learning of recurrent neural networks , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[72]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[73]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[74]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[75]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[76]  Amy Loutfi,et al.  A review of unsupervised feature learning and deep learning for time-series modeling , 2014, Pattern Recognit. Lett..

[77]  Yoshua Bengio,et al.  Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.

[78]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[79]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[80]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Daniel Nikovski,et al.  Anomaly Detection in Real-Valued Multidimensional Time Series , 2014 .

[82]  Zhen Guo,et al.  Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[83]  Derya Birant,et al.  Spatio-temporal outlier detection in large databases , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[84]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[85]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[86]  Li Wei,et al.  Assumption-Free Anomaly Detection in Time Series , 2005, SSDBM.

[87]  Mooi Choo Chuah,et al.  ECG Anomaly Detection via Time Series Analysis , 2007, ISPA Workshops.

[88]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[89]  Jean-Christophe Nebel,et al.  Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series , 2010, 2010 20th International Conference on Pattern Recognition.

[90]  Philip Chan,et al.  Learning States and Rules for Detecting Anomalies in Time Series , 2005, Applied Intelligence.

[91]  Lionel Tarassenko,et al.  A System for the Analysis of Jet Engine Vibration Data , 1999, Integr. Comput. Aided Eng..

[92]  Deepthi Cheboli,et al.  Anomaly detection of time series. , 2010 .

[93]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[94]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[95]  Hui Xiong,et al.  Top-Eye: top-k evolving trajectory outlier detection , 2010, CIKM.

[96]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[97]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[98]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[99]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[100]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[101]  Jeffrey Scott Vitter,et al.  Mining deviants in time series data streams , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[102]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[103]  Haifeng Chen,et al.  Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[104]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[105]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[106]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[107]  Pingzhi Fan,et al.  A new anomaly detection method based on hierarchical HMM , 2003, Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[108]  Eyal Amir,et al.  Real‐time Bayesian anomaly detection in streaming environmental data , 2007 .

[109]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[110]  Joaquín González-Rodríguez,et al.  Automatic language identification using long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[111]  Jae-Gil Lee,et al.  Temporal Outlier Detection in Vehicle Traffic Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[112]  Zhilin Li,et al.  A Multiscale Approach for Spatio‐Temporal Outlier Detection , 2006, Trans. GIS.

[113]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[114]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[115]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[116]  Martin Meckesheimer,et al.  Automatic outlier detection for time series: an application to sensor data , 2007, Knowledge and Information Systems.

[117]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[118]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[119]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[120]  Chang-Tien Lu,et al.  Wavelet fuzzy classification for detecting and tracking region outliers in meteorological data , 2004, GIS '04.

[121]  Frank K. Soong,et al.  TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.

[122]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.