Multicriteria Similarity-Based Anomaly Detection Using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhibits anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g., as measured by the nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains, there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including nonmetric measures, and one can test for anomalies by scalarizing using a nonnegative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multicriteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria, and shows superior performance on experiments with synthetic and real data sets.

[1]  Zhi-Hua Zhou,et al.  On Detecting Clustered Anomalies Using SCiForest , 2010, ECML/PKDD.

[2]  Mikkel T. Jensen,et al.  Reducing the run-time complexity of multiobjective EAs: The NSGA-II and other algorithms , 2003, IEEE Trans. Evol. Comput..

[3]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[4]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[5]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[8]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[9]  Hsien-Kuei Hwang,et al.  Multivariate Records Based on Dominance , 2010, 1003.6119.

[10]  Alfred O. Hero,et al.  Efficient anomaly detection using bipartite k-NN graphs , 2011, NIPS.

[11]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[12]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Alfred O. Hero,et al.  Pareto-Optimal Methods for Gene Ranking , 2004, J. VLSI Signal Process..

[15]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  O. Barndorfi-nielsen,et al.  On the distribution of the number of admissible points in a vector , 1966 .

[18]  Hsien-Kuei Hwang,et al.  Maxima in hypercubes , 2005, Random Struct. Algorithms.

[19]  Marc Parizeau,et al.  Generalizing the improved run-time complexity algorithm for non-dominated sorting , 2013, GECCO '13.

[20]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[21]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[22]  E. Polak,et al.  On Multicriteria Optimization , 1976 .

[23]  Joseph E. Yukich,et al.  Maximal Points and Gaussian Fields , 2005 .

[24]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[26]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[27]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[28]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[29]  Trevor Darrell,et al.  Multi-View Learning in the Presence of View Disagreement , 2008, UAI 2008.

[30]  Alfred O. Hero,et al.  Multi-criteria Anomaly Detection using Pareto Depth Analysis , 2011, NIPS.

[31]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.

[32]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[33]  Alfred O. Hero,et al.  A PDE-based Approach to Nondominated Sorting , 2013, SIAM J. Numer. Anal..

[34]  Barbara Majecka,et al.  Statistical models of pedestrian behaviour in the Forum , 2009 .

[35]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[36]  V. M. Ivanin Asymptotic estimate for the mathematical expectation of the number of elements in the Pareto set , 1975 .

[37]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[38]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .