A Comparative Study of Outlier Detection Algorithms

Data Mining is the process of extracting interesting information from large sets of data. Outliers are defined as events that occur very infrequently. Detecting outliers before they escalate with potentially catastrophic consequences is very important for various real life applications such as in the field of fraud detection, network robustness analysis, and intrusion detection. This paper presents a comprehensive analysis of three outlier detection methods Extensible Markov Model (EMM), Local Outlier Factor (LOF) and LCS-Mine, where algorithm analysis shows the time complexity analysis and outlier detection accuracy. The experiments conducted with Ozone level Detection, IR video trajectories, and 1999 and 2000 DARPA DDoS datasets demonstrate that EMM outperforms both LOF and LSC-Mine in both time and outlier detection accuracy.

[1]  Zhongfei Zhang Mining surveillance video for independent motion detection , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Salvatore J. Stolfo,et al.  AI Approaches to Fraud Detection and Risk Management , 1998, AI Mag..

[3]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[4]  Edgar Acuña,et al.  Parallel algorithms for distance-based and density-based outliers , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Wolfgang Gaul,et al.  "Classification, Clustering, and Data Mining Applications" , 2004 .

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[7]  Kun Zhang,et al.  Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond , 2008, Knowledge and Information Systems.

[8]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[9]  J. Elliott,et al.  Distributed denial of service attacks and the zombie ant effect , 2000 .

[10]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[11]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[12]  Jie Huang,et al.  Extensible Markov model , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Dragoljub Pokrajac,et al.  Using spatiotemporal blocks to reduce the uncertainty in detecting and tracking moving objects in video , 2006, Int. J. Intell. Syst. Technol. Appl..

[14]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[15]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[16]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[17]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[18]  T. Lane,et al.  Sequence Matching and Learning in Anomaly Detection for Computer Security , 1997 .

[19]  Gregory K. Miller,et al.  Elements of Applied Stochastic Processes , 1972 .

[20]  Margaret H. Dunham,et al.  Risk Leveling of Network Traffic Anomalies , 2006 .

[21]  Lionel Tarassenko,et al.  A System for the Analysis of Jet Engine Vibration Data , 1999, Integr. Comput. Aided Eng..

[22]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[23]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[24]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[25]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[26]  Robert L. Grossman,et al.  KDD-2005 : proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2005, Chicago, Illinois, USA , 2005 .

[27]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Jie Huang,et al.  Rare Event Detection in a Spatiotemporal Environment , 2006, 2006 IEEE International Conference on Granular Computing.

[30]  Edwina L. Rissland,et al.  Inductive Learning in a Mixed Paradigm Setting , 1990, AAAI.