Interactive Anomaly Identification with Erroneous Feedback

The difficulties in analyzing large and extensive systems necessitate the use of efficient machine-learning tools to identify unknown system anomalies in order to avoid critical problems and ensure high reliability. Given that data logged by a system include unknown anomalies, anomaly identification models aim to simultaneously identify the time of occurrence and the features that contributed to these anomalies. To maximize accuracy, it is important to utilize the data as well as the domain knowledge of the system. However, it is difficult for a system analyst to possess not only machine-learning capabilities but also domain knowledge to incorporate into the model. In this paper, we propose a new anomaly identification framework capable of utilizing feedback based on domain knowledge without requiring any machine-learning capabilities. We also propose a novel method, the so-called rank ensemble method, to improve the accuracy of anomaly identification with erroneous feedback, that is, feedback that includes incorrect information. Our method enables erroneous information to be adaptively ignored by assuming consistency between the data and the user feedback. An intensive parameter study using benchmark datasets and a case study with real vehicle data demonstrate the applicability of our framework.

[1]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[2]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[3]  Naoki Nishimura,et al.  A Data-Driven Health Monitoring Method for Satellite Housekeeping Data Based on Probabilistic Clustering and Dimensionality Reduction , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[4]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[6]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[7]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.

[8]  Svetha Venkatesh,et al.  Anomaly detection in large-scale data stream networks , 2012, Data Mining and Knowledge Discovery.

[9]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[10]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[11]  Jianying Hu,et al.  One-Class Matrix Completion with Low-Density Factorizations , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Alfred O. Hero,et al.  Efficient anomaly detection using bipartite k-NN graphs , 2011, NIPS.

[13]  Eric P. Xing,et al.  Language Modeling with Power Low Rank Ensembles , 2013, EMNLP.

[14]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[15]  Nagarajan Natarajan,et al.  PU Learning for Matrix Completion , 2014, ICML.

[16]  Morteza Mardani,et al.  Recovery of Low-Rank Plus Compressed Sparse Matrices With Application to Unveiling Traffic Anomalies , 2012, IEEE Transactions on Information Theory.

[17]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[18]  Sushanta Karmakar,et al.  A Neural Network based system for Intrusion Detection and attack classification , 2016, 2016 Twenty Second National Conference on Communication (NCC).

[19]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[21]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[22]  Francesco Ricci,et al.  Active Learning in Collaborative Filtering Recommender Systems , 2014, EC-Web.

[23]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.

[24]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[25]  Morteza Mardani,et al.  Dynamic Anomalography: Tracking Network Anomalies Via Sparsity and Low Rank , 2012, IEEE Journal of Selected Topics in Signal Processing.

[26]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[27]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  G. Stewart Perturbation theory for the singular value decomposition , 1990 .

[29]  Thomas G. Dietterich,et al.  Sequential Feature Explanations for Anomaly Detection , 2019, ACM Trans. Knowl. Discov. Data.

[30]  Yukihiro Tadokoro,et al.  Structured Denoising Autoencoder for Fault Detection and Analysis , 2014, ACML.