GPU-Accelerated Feature Selection for Outlier Detection Using the Local Kernel Density Ratio

Effective outlier detection requires the data to be described by a set of features that captures the behavior of normal data while emphasizing those characteristics of outliers which make them different than normal data. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which caters to outlier detection problems. The proposed method seeks the subset of features that represents the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared to popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

[1]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Fatemeh Azmandian,et al.  Learning at the virtualization layer: Intrusion detection and workload characterization from within the virtual machine monitor , 2012 .

[6]  Motoaki Kawanabe,et al.  Direct Density Ratio Estimation with Dimensionality Reduction , 2010, SDM.

[7]  Vivekanand Gopalkrishnan,et al.  Feature Extraction for Outlier Detection in High-Dimensional Spaces , 2010, FSDM.

[8]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[11]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Jennifer G. Dy,et al.  From Transformation-Based Dimensionality Reduction to Feature Selection , 2010, ICML.

[14]  H. A. Guvenir,et al.  A supervised machine learning algorithm for arrhythmia analysis , 1997, Computers in Cardiology 1997.

[15]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[16]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[19]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[22]  Philip S. Yu,et al.  An effective and efficient algorithm for high-dimensional outlier detection , 2005, The VLDB Journal.

[23]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[24]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[25]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Le Song,et al.  Gene selection via the BAHSIC family of algorithms , 2007, ISMB/ECCB.

[28]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[29]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[30]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[31]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.