Anomaly Detection via Online Oversampling Principal Component Analysis

Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this paper, we propose an online oversampling principal component analysis (osPCA) algorithm to address this problem, and we aim at detecting the presence of outliers from a large amount of data via an online updating technique. Unlike prior principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems. By oversampling the target instance and extracting the principal direction of the data, the proposed osPCA allows us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen analysis explicitly, the proposed framework is favored for online applications which have computation or memory limitations. Compared with the well-known power method for PCA and other popular anomaly detection algorithms, our experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency.

[1]  Tarem Ahmed,et al.  Online Anomaly Detection Using KDE , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[4]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[5]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[6]  Bin Yang,et al.  Projection approximation subspace tracking , 1995, IEEE Trans. Signal Process..

[7]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  A. Madansky Identification of Outliers , 1988 .

[9]  H. L. Creek,et al.  Head of the Department , 1950 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[14]  Nguyen Lu Dang Khoa,et al.  Robust Outlier Detection Using Commute Time and Eigenspace Embedding , 2010, PAKDD.

[15]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[17]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[18]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[19]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[20]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[21]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[22]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[23]  Yuh-Jye Lee,et al.  Anomaly detection via over-sampling principal component analysis , 2009 .

[24]  Xiangliang Zhang,et al.  A Novel Intrusion Detection Method Based on Principle Component Analysis in Computer Security , 2004, ISNN.

[25]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Perturbational Analysis of Classical Scaling , 1979 .

[26]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[27]  Arun K. Pujari,et al.  On the Use of Singular Value Decomposition for a Fast Intrusion Detection System , 2006, Electron. Notes Theor. Comput. Sci..

[28]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.