Outlier Detection with One-Class Classifiers from ML and KDD

The problem of outlier detection is well studied in the fields of Machine Learning (ML) and Knowledge Discovery in Databases (KDD). Both fields have their own methods and evaluation procedures. In ML, Support Vector Machines and Parzen Windows are well-known methods that can be used for outlier detection. In KDD, the heuristic local-density estimation methods LOF and LOCI are generally considered to be superior outlier-detection methods. Hitherto, the performances of these ML and KDD methods have not been compared. This paper formalizes LOF and LOCI in the ML framework of one-class classification and performs a comparative evaluation of the ML and KDD outlier-detection methods on real-world datasets. Experimental results show that LOF and SVDD are the two best-performing methods. It is concluded that both fields offer outlier-detection methods that are competitive in performance and that bridging the gap between both fields may facilitate the development of outlier-detection methods.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Eibe Frank,et al.  Discriminating Against New Classes: One-class versus Multi-class Classification , 2008, Australasian Conference on Artificial Intelligence.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[7]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[8]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[9]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[12]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[13]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[14]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[15]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[16]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Alexander Ypma,et al.  Learning methods for machine vibration analysis and health monitoring , 2001 .

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[20]  Nathalie Japkowicz,et al.  Concept learning in the absence of counterexamples: an autoassociation-based approach to classification , 1999 .