Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data

Anomaly detection for high-dimensional time series is always a difficult problem due to its vast search space. For general high-dimensional data, the anomalies often manifest in subspaces rather than the whole data space, and it requires an O(2 ) combinatorial search for finding the exact solution (i.e., the anomalous subspaces) where N denotes the number of dimensions. In this paper, we present a novel and practical unsupervised anomaly retrieval system to retrieve anomalies from a large volume of high dimensional transactional time series. Our system consists of two integrated modules: subspace searching module and time series discord mining module. For the subspace searching module, we propose two approximate searching methods which are capable of finding quality anomalous subspaces orders of magnitudes faster than the brute-force solution. For the discord mining module, we adopt a simple, yet effective nearest neighbor method. The proposed system is implemented and evaluated on both synthetic and real-world transactional data. The results indicate that our anomaly retrieval system can localize high quality anomaly candidates in seconds, making it practical to use in a production environment.

[1]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[2]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.

[3]  Maria E. Orlowska,et al.  Projected outlier detection in high-dimensional mixed-attributes data set , 2009, Expert Syst. Appl..

[4]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .

[5]  Eamonn J. Keogh,et al.  Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[6]  Durdu Ömer Faruk A hybrid neural network and ARIMA model for water quality time series prediction , 2010, Eng. Appl. Artif. Intell..

[7]  Leman Akoglu,et al.  xStream: Outlier Detection in Feature-Evolving Data Streams , 2018, KDD.

[8]  Trilce Estrada,et al.  Time Series Join on Subsequence Correlation , 2014, 2014 IEEE International Conference on Data Mining.

[9]  Eamonn J. Keogh,et al.  Matrix Profile VI: Meaningful Multidimensional Motif Discovery , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[10]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[11]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[12]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[13]  Jing Lin,et al.  An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection , 2015, Reliab. Eng. Syst. Saf..

[14]  Ge Yu,et al.  Clustering Stream Data by Exploring the Evolution of Density Mountain , 2017, Proc. VLDB Endow..

[15]  Milton S. Boyd,et al.  Designing a neural network for forecasting financial and economic time series , 1996, Neurocomputing.

[16]  Alexandre Termier,et al.  Anomaly Detection in Streams with Extreme Value Theory , 2017, KDD.

[17]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Ji Zhang,et al.  SPOT: A System for Detecting Projected Outliers From High-dimensional Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[20]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[21]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[22]  Kishan G. Mehrotra,et al.  Forecasting the behavior of multivariate time series using neural networks , 1992, Neural Networks.

[23]  James Bailey,et al.  Discovering outlying aspects in large datasets , 2016, Data Mining and Knowledge Discovery.

[24]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[25]  James Bailey,et al.  Mining outlying aspects on numeric data , 2015, Data Mining and Knowledge Discovery.

[26]  Xiaohui Gu,et al.  UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.

[27]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[28]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[30]  Jing Lin,et al.  Sliding Window-Based Fault Detection From High-Dimensional Data Streams , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[31]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[32]  Wei Sun,et al.  Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network , 2019, KDD.

[33]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[34]  Xiaohui Gu,et al.  TScope: Automatic Timeout Bug Identification for Server Systems , 2018, 2018 IEEE International Conference on Autonomic Computing (ICAC).

[35]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[36]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[37]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[38]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[39]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..