Demystifying Numenta anomaly benchmark

Detecting anomalies in large-scale, streaming datasets has wide applicability in a myriad of domains like network intrusion detection for cyber-security, fraud detection for credit cards, system health monitoring, and fault detection in safety critical systems. Due to its wide applicability, the problem of anomaly detection has been well-studied by industry and academia alike, and many algorithms have been proposed for detecting anomalies in different problem settings. But until recently, there was no openly available, systematic dataset and/or framework using which the proposed anomaly detection algorithms could be compared and evaluated on a common ground. Numenta Anomaly Benchmark (NAB), made available by Numenta1 in 2015, addressed this gap by providing a set of openly-available, labeled data files and a common scoring system, using which different anomaly detection algorithms could be fairly evaluated and compared. In this paper, we provide an in-depth analysis of the key aspects of the NAB framework, and highlight inherent challenges therein, with the objective to provide insights about the gaps in the current framework that must be addressed so as to make it more robust and easy-to-use. Furthermore, we also provide additional evaluation of five state-of-the-art anomaly detection algorithms (including the ones proposed by Numenta) using the NAB datasets, and based on the evaluation results, we argue that the performance of these algorithms is not sufficient for practical, industry-scale applications, and must be improved upon so as to make them suitable for large-scale anomaly detection problems.

[1]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[2]  Dipankar Dasgupta,et al.  A comparison of negative and positive selection algorithms in novel pattern detection , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[3]  Sameer Singh,et al.  An approach to novelty detection applied to the classification of image regions , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[5]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[6]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[8]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[9]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[10]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[11]  Charu C. Aggarwal,et al.  On Abnormality Detection in Spuriously Populated Data Streams , 2005, SDM.

[12]  Stephen L. Scott,et al.  Detecting Network Intrusion Using a Markov Modulated Nonhomogeneous Poisson Process , 2000 .

[13]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[14]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Marius Kloft,et al.  Online Anomaly Detection under Adversarial Impact , 2010, AISTATS.

[16]  Marius Kloft,et al.  Automatic feature selection for anomaly detection , 2008, AISec '08.

[17]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[18]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[19]  Mikhail J. Atallah,et al.  Detection of significant sets of episodes in event sequences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).