Explaining anomalies in groups with characterizing subspace rules

Anomaly detection has numerous applications and has been studied vastly. We consider a complementary problem that has a much sparser literature: anomaly description. Interpretation of anomalies is crucial for practitioners for sense-making, troubleshooting, and planning actions. To this end, we present a new approach called x-PACS (for eXplaining Patterns of Anomalies with Characterizing Subspaces), which “reverse-engineers” the known anomalies by identifying (1) the groups (or patterns) that they form, and (2) the characterizing subspace and feature rules that separate each anomalous pattern from normal instances. Explaining anomalies in groups not only saves analyst time and gives insight into various types of anomalies, but also draws attention to potentially critical, repeating anomalies. In developing x-PACS, we first construct a desiderata for the anomaly description problem. From a descriptive data mining perspective, our method exhibits five desired properties in our desiderata. Namely, it can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by human analysts, (iv) different from normal patterns of the data, and finally (v) succinct, providing a short data description. No existing work on anomaly description satisfies all of these properties simultaneously. Furthermore, x-PACS is highly parallelizable; it is linear on the number of data points and exponential on the (typically small) largest characterizing subspace size. The anomalous patterns that x-PACS finds constitute interpretable “signatures”, and while it is not our primary goal, they can be used for anomaly detection. Through extensive experiments on real-world datasets, we show the effectiveness and superiority of x-PACS in anomaly explanation over various baselines, and demonstrate its competitive detection performance as compared to the state-of-the-art.

[1]  Jingrui He,et al.  Co-selection of Features and Instances for Unsupervised Rare Category Analysis , 2010, SDM.

[2]  Jingrui He,et al.  Coselection of features and instances for unsupervised rare category analysis , 2010, Stat. Anal. Data Min..

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  Ira Assent,et al.  Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Martin Ester,et al.  P3C: A Robust Projected Clustering Algorithm , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[7]  Ira Assent,et al.  Local Outlier Detection with Interpretation , 2013, ECML/PKDD.

[8]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[9]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[10]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[11]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[12]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[13]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[14]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[15]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[16]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable , 2016, 1606.05390.

[18]  Mohammed J. Zaki,et al.  SCHISM: a new approach for interesting subspace mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[19]  Ian Davidson,et al.  A Framework for Outlier Description Using Constraint Programming , 2016, AAAI.

[20]  Hans-Peter Kriegel,et al.  Outlier Detection in Arbitrarily Oriented Subspaces , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Willi Klösgen,et al.  Census Data Mining – An Application , 2002 .

[22]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[23]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[24]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[25]  Klemens Böhm,et al.  Flexible and adaptive subspace search for outlier analysis , 2013, CIKM.

[26]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[27]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[28]  Luigi Palopoli,et al.  Detecting outlying properties of exceptional objects , 2009, TODS.

[29]  Houtao Deng,et al.  Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[30]  Yin Zhang,et al.  Measuring and fingerprinting click-spam in ad networks , 2012, SIGCOMM.

[31]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[32]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[33]  Jingrui He,et al.  Rare Category Characterization , 2010, 2010 IEEE International Conference on Data Mining.

[34]  Haopeng Zhang,et al.  EXstream: Explaining Anomalies in Event Stream Monitoring , 2017, EDBT.

[35]  Ira Assent,et al.  OutRank: ranking outliers in high dimensional data , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[36]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[37]  Ira Assent,et al.  Explaining Outliers by Subspace Separability , 2013, 2013 IEEE 13th International Conference on Data Mining.

[38]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[39]  Marius Kloft,et al.  Active and Semi-supervised Data Domain Description , 2009, ECML/PKDD.

[40]  Arthur Zimek,et al.  Discriminative features for identifying and interpreting outliers , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[41]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[42]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[43]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[44]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[45]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[46]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[47]  James Bailey,et al.  Mining influential attributes that capture class and group contrast behaviour , 2008, CIKM '08.

[48]  Martin Holeňa,et al.  Interpreting and clustering outliers with sapling random forests , 2014 .

[49]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[50]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[51]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[52]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[53]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[54]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[55]  Tomáš Pevný,et al.  Explaining anomalies with Sapling Random Forests , 2014 .

[56]  Jan Vondrák,et al.  Submodular maximization by simulated annealing , 2010, SODA '11.

[57]  Luigi Palopoli,et al.  Discovering Characterizations of the Behavior of Anomalous Subpopulations , 2013, IEEE Transactions on Knowledge and Data Engineering.

[58]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[59]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[60]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[61]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.