There and back again: Outlier detection between statistical reasoning and data mining algorithms

Outlier detection has been a topic in statistics for centuries. Over mainly the last two decades, there has been also an increasing interest in the database and data mining community to develop scalable methods for outlier detection. Initially based on statistical reasoning, however, these methods soon lost the direct probabilistic interpretability of the derived outlier scores. Here, we detail from a joint point of view of data mining and statistics the roots and the path of development of statistical outlier detection and of database‐related data mining methods for outlier detection. We discuss their inherent meaning, review approaches to again find a statistically meaningful interpretation of outlier scores, and sketch related current research topics.

[1]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[2]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[3]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[4]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[5]  Klemens Böhm,et al.  Local context selection for outlier ranking in graphs with multiple numeric node attributes , 2014, SSDBM '14.

[6]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Le Gruenwald,et al.  Research issues in outlier detection for data streams , 2014, SKDD.

[8]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[9]  Mia Hubert,et al.  Robust statistics for outlier detection , 2011, WIREs Data Mining Knowl. Discov..

[10]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  James Bailey,et al.  Mining multidimensional contextual outliers from categorical relational data , 2013, SSDBM.

[13]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[14]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[15]  Jörg Sander,et al.  Finding Surprisingly Frequent Patterns of Variable Lengths in Sequence Data , 2016, SDM.

[16]  Klemens Böhm,et al.  Statistical Selection of Congruent Subspaces for Mining Attributed Graphs , 2013, 2013 IEEE 13th International Conference on Data Mining.

[17]  M. Otto,et al.  Outliers in Time Series , 1972 .

[18]  William Kruskal,et al.  Some Remarks on Wild Observations , 1960 .

[19]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[20]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[21]  Hans-Peter Kriegel,et al.  The (black) art of runtime evaluation: Are we comparing algorithms or implementations? , 2017, Knowledge and Information Systems.

[22]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[23]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[24]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[25]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[26]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[27]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[28]  Emmanuel Müller,et al.  Adaptive outlierness for subspace outlier ranking , 2010, CIKM '10.

[29]  Fabrizio Angiulli,et al.  Distance-based outlier queries in data streams: the novel task and algorithms , 2010, Data Mining and Knowledge Discovery.

[30]  W. R. Thompson On a Criterion for the Rejection of Observations and the Distribution of the Ratio of Deviation to Sample Standard Deviation , 1935 .

[31]  Chang-Tien Lu,et al.  Algorithms for spatial outlier detection , 2003, Third IEEE International Conference on Data Mining.

[32]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[33]  Michael J. V. Leach,et al.  Contextual anomaly detection in crowded surveillance scenes , 2014, Pattern Recognit. Lett..

[34]  Luigi Palopoli,et al.  Discovering Characterizations of the Behavior of Anomalous Subpopulations , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Shirish Tatikonda,et al.  Locality Sensitive Outlier Detection: A ranking driven approach , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[36]  Johann Jacob Baeyer,et al.  Gradmessung in Ostpreussen und ihre Verbindung mit Preussischen und Russischen Dreiecksketten , 1838 .

[37]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[38]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[39]  Sanjay Chawla,et al.  Density-preserving projections for large-scale local anomaly detection , 2012, Knowledge and Information Systems.

[40]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[41]  Osmar R. Zaïane,et al.  An Efficient Reference-Based Approach to Outlier Detection in Large Datasets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[42]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[43]  G. Box,et al.  Bayesian analysis of some outlier problems in time series , 1979 .

[44]  Arthur Zimek,et al.  The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives , 2013, Machine Learning.

[45]  Aristides Gionis,et al.  k-means-: A Unified Approach to Clustering and Outlier Detection , 2013, SDM.

[46]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[47]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[48]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[49]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[50]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[51]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[52]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[53]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[54]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[55]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[56]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[57]  Bernard Rosner,et al.  On the Detection of Many Outliers , 1975 .

[58]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[59]  Benjamin Peirce,et al.  Criterion for the rejection of doubtful observations , 1852 .

[60]  Pasi Fränti,et al.  Outlier Detection Using k-Nearest Neighbour Graph , 2004, ICPR.

[61]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[62]  Ira Assent,et al.  An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data , 2011, DASFAA.

[63]  A. Madansky Identification of Outliers , 1988 .

[64]  Noel A Cressie,et al.  Cressie‐Read Statistic , 2006 .

[65]  Michel Verleysen,et al.  Improving the Robustness to Outliers of Mixtures of Probabilistic PCAs , 2008, PAKDD.

[66]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[67]  Hongjun Lu,et al.  Finding centric local outliers in categorical/numerical spaces , 2006, Knowledge and Information Systems.

[68]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[69]  Ira Assent,et al.  Local Outlier Detection with Interpretation , 2013, ECML/PKDD.

[70]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[71]  Vic Barnett,et al.  The Study of Outliers: Purpose and Model , 1978 .

[72]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[73]  A. Dempster,et al.  New Tools for Residual Analysis , 1981 .

[74]  Arthur Zimek,et al.  Good and Bad Neighborhood Approximations for Outlier Detection Ensembles , 2017, SISAP.

[75]  Chao Gao,et al.  Robust Covariance Matrix Estimation via Matrix Depth , 2015 .

[76]  F. Prieto,et al.  Cluster Identification Using Projections , 2001 .

[77]  Longbing Cao,et al.  SVDD-based outlier detection on uncertain data , 2012, Knowledge and Information Systems.

[78]  Hans-Peter Kriegel,et al.  SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams , 2016, SSDBM.

[79]  Leman Akoglu,et al.  Less is More , 2016, ACM Trans. Knowl. Discov. Data.

[80]  Christos Faloutsos,et al.  Example-based robust outlier detection in high dimensional datasets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[81]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[82]  Kanishka Bhaduri,et al.  Algorithms for speeding up distance-based outlier detection , 2011, KDD.

[83]  Clara Pizzuti,et al.  Outlier mining in large high-dimensional data sets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[84]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[85]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[86]  Heiko Paulheim,et al.  A decomposition of the outlier detection problem into a set of supervised learning problems , 2015, Machine Learning.

[87]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[88]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .

[89]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[90]  Ira Assent,et al.  Explaining Outliers by Subspace Separability , 2013, 2013 IEEE 13th International Conference on Data Mining.

[91]  Jörg Sander,et al.  Mining Statistically Significant Co-location and Segregation Patterns , 2014, IEEE Transactions on Knowledge and Data Engineering.

[92]  Claudio Agostinelli,et al.  Robust estimation for circular data , 2007, Comput. Stat. Data Anal..

[93]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[94]  Srinivasan Parthasarathy,et al.  Robust Contextual Outlier Detection: Where Context Meets Sparsity , 2016, CIKM.

[95]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[96]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[97]  Daniel Bernoulli,et al.  The most probable choice between several discrepant observations and the formation therefrom of the most likely induction , 1961 .

[98]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[99]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[100]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[101]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[102]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[103]  Jilles Vreeken,et al.  The Odd One Out: Identifying and Characterising Anomalies , 2011, SDM.

[104]  Luigi Palopoli,et al.  Detecting outlying properties of exceptional objects , 2009, TODS.

[105]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[106]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[107]  Ira Assent,et al.  OutRank: ranking outliers in high dimensional data , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[108]  Rasmus Pagh,et al.  A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data , 2012, KDD.

[109]  Ira Assent,et al.  AnyOut: Anytime Outlier Detection on Streaming Data , 2012, DASFAA.

[110]  Arthur Zimek,et al.  On the Evaluation of Outlier Detection and One-Class Classification Methods , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[111]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[112]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[113]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[114]  Nick Craswell,et al.  Precision at n , 2009, Encyclopedia of Database Systems.

[115]  Tok Wang Ling,et al.  HOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data , 2004, VLDB.

[116]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[117]  Chang-Tien Lu,et al.  Spatial Weighted Outlier Detection , 2006, SDM.

[118]  Charu Agarwal,et al.  Outlier ensembles , 2013, ODD '13.

[119]  Srinivasan Parthasarathy,et al.  Distance-based outlier detection , 2010, Proc. VLDB Endow..

[120]  Arthur Zimek,et al.  Data perturbation for outlier detection ensembles , 2014, SSDBM '14.

[121]  Yi Zhang,et al.  Average Precision , 2009, Encyclopedia of Database Systems.

[122]  Peter Filzmoser,et al.  Noname manuscript No. (will be inserted by the editor) Identification of local multivariate outliers , 2022 .

[123]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[124]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[125]  Miriam A. M. Capretz,et al.  Contextual anomaly detection framework for big sensor data , 2015, Journal of Big Data.

[126]  Arthur Zimek,et al.  Discriminative features for identifying and interpreting outliers , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[127]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[128]  K. Popper Logik der Forschung : zur erkenntnistheorie der modernen naturwissenschaft , 1936 .

[129]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[130]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[131]  David M. Rocke,et al.  Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator , 2004, Comput. Stat. Data Anal..

[132]  Pasi Fränti,et al.  Outlier detection using k-nearest neighbour graph , 2004, ICPR 2004.

[133]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[134]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[135]  Dipankar Dasgupta,et al.  A comparison of negative and positive selection algorithms in novel pattern detection , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[136]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[137]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[138]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[139]  Ursula Gather,et al.  The Masking Breakdown Point of Multivariate Outlier Identification Rules , 1999 .

[140]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[141]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[142]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[143]  Luigi Palopoli,et al.  Outlying property detection with numerical attributes , 2013, Data Mining and Knowledge Discovery.

[144]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[145]  Hans-Peter Kriegel,et al.  A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms , 2008, SSDBM.

[146]  N. Campbell Robust Procedures in Multivariate Analysis II. Robust Canonical Variate Analysis , 1982 .

[147]  Vijayalakshmi Atluri,et al.  Spatial neighborhood based anomaly detection in sensor datasets , 2009, Data Mining and Knowledge Discovery.

[148]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[149]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[150]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[151]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[152]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[153]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[154]  Peter Filzmoser,et al.  An Object-Oriented Framework for Robust Multivariate Analysis , 2009 .

[155]  E. S. Pearson,et al.  THE EFFICIENCY OF STATISTICAL TOOLS AND A CRITERION FOR THE REJECTION OF OUTLYING OBSERVATIONS , 1936 .

[156]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[157]  Peter Rousseeuw,et al.  Detecting Deviating Data Cells , 2016, Technometrics.

[158]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[159]  James Bailey,et al.  Scalable Outlying-Inlying Aspects Discovery via Feature Ranking , 2015, PAKDD.

[160]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[161]  Zengyou He,et al.  A Fast Greedy Algorithm for Outlier Mining , 2005, PAKDD.

[162]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[163]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from time series , 2006, IEEE Transactions on Knowledge and Data Engineering.

[164]  H. Hornich Logik der Forschung , 1936 .

[165]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[166]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[167]  Michael Gertz,et al.  In-network detection of anomaly regions in sensor networks with obstacles , 2009, Computer Science - Research and Development.

[168]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[169]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[170]  Chang-Tien Lu,et al.  Spatial outlier detection: random walk based approaches , 2010, GIS '10.

[171]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[172]  Ali S. Hadi,et al.  Detection of outliers , 2009 .

[173]  J. Pei,et al.  Outlier detection on uncertain data: Objects, instances, and inferences , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[174]  Tossapon Boongoen,et al.  Comparative study of matrix refinement approaches for ensemble clustering , 2013, Machine Learning.

[175]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[176]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[177]  James Bailey,et al.  Mining outlying aspects on numeric data , 2015, Data Mining and Knowledge Discovery.

[178]  Karsten M. Borgwardt,et al.  Rapid Distance-Based Outlier Detection via Sampling , 2013, NIPS.

[179]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[180]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[181]  C. Croux,et al.  Robust High-Dimensional Precision Matrix Estimation , 2014, 1501.01219.

[182]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[183]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[184]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[185]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[186]  Arthur Zimek,et al.  On the internal evaluation of unsupervised outlier detection , 2015, SSDBM.

[187]  Vivekanand Gopalkrishnan,et al.  Efficient Pruning Schemes for Distance-Based Outlier Detection , 2009, ECML/PKDD.

[188]  Ian Davidson,et al.  Discovering Contexts and Contextual Outliers Using Random Walks in Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[189]  Stefan Van Aelst,et al.  Propagation of outliers in multivariate data , 2009, 0903.0447.

[190]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[191]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.

[192]  Hans-Peter Kriegel,et al.  Outlier Detection in Arbitrarily Oriented Subspaces , 2012, 2012 IEEE 12th International Conference on Data Mining.

[193]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[194]  Jing Gao,et al.  Converting Output Scores from Outlier Detection Algorithms into Probability Estimates , 2006, Sixth International Conference on Data Mining (ICDM'06).

[195]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[196]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[197]  Garth Tarr,et al.  Robust estimation of precision matrices under cellwise contamination , 2015, Comput. Stat. Data Anal..

[198]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[199]  Sanjay Chawla,et al.  SLOM: a new measure for local spatial outliers , 2006, Knowledge and Information Systems.

[200]  James Bailey,et al.  Discovering outlying aspects in large datasets , 2016, Data Mining and Knowledge Discovery.

[201]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[202]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[203]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[204]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[205]  Dipankar Dasgupta,et al.  Anomaly detection in multidimensional data using negative selection algorithm , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[206]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[207]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[208]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[209]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[210]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[211]  Qiang He,et al.  LSHiForest: A Generic Framework for Fast Tree Isolation Based Ensemble Anomaly Analysis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[212]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[213]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[214]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[215]  D. Collett,et al.  The Subjective Nature of Outlier Rejection Procedures , 1976 .

[216]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[217]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[218]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[219]  Erich Schubert Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining , 2013 .

[220]  Christos Faloutsos,et al.  Mobile call graphs: beyond power-law and lognormal distributions , 2008, KDD.

[221]  Fabrizio Angiulli,et al.  DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets , 2009, TKDD.