Fast One-class Classification using Class Boundary-preserving Random Projections

Several applications, like malicious URL detection and web spam detection, require classification on very high-dimensional data. In such cases anomalous data is hard to find but normal data is easily available. As such it is increasingly common to use a one-class classifier (OCC). Unfortunately, most OCC algorithms cannot scale to datasets with extremely high dimensions. In this paper, we present Fast Random projection-based One-Class Classification (FROCC), an extremely efficient, scalable and easily parallelizable method for one-class classification with provable theoretical guarantees. Our method is based on the simple idea of transforming the training data by projecting it onto a set of random unit vectors that are chosen uniformly and independently from the unit sphere, and bounding the regions based on separation of the data. FROCC can be naturally extended with kernels. We provide a new theoretical framework to prove that that FROCC generalizes well in the sense that it is stable and has low bias for some parameter settings. We then develop a fast scalable approximation of FROCC using vectorization, exploiting data sparsity and parallelism to develop a new implementation called ParDFROCC. ParDFROCC achieves up to 2 percent points better ROC than the next best baseline, with up to 12× speedup in training and test times over a range of state-of-the-art benchmarks for the OCC task.

[1]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[2]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[3]  Jun Li,et al.  One-Class Adversarial Nets for Fraud Detection , 2018, AAAI.

[4]  Qian Du,et al.  Anomaly Detection and Reconstruction From Random Projections , 2012, IEEE Transactions on Image Processing.

[5]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[6]  Yue Zhao,et al.  PyOD: A Python Toolbox for Scalable Outlier Detection , 2019, J. Mach. Learn. Res..

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[9]  Harsha Vardhan Simhadri,et al.  DROCC: Deep Robust One-Class Classification , 2020, ICML.

[10]  Arthur Zimek,et al.  On the Evaluation of Outlier Detection and One-Class Classification Methods , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[11]  Sanjay Chawla,et al.  Density-preserving projections for large-scale local anomaly detection , 2012, Knowledge and Information Systems.

[12]  B. Efron The convex hull of a random set of points , 1965 .

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[15]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[16]  Calton Pu,et al.  Introducing the Webb Spam Corpus: Using Email Spam to Identify Web Spam Automatically , 2006, CEAS.

[17]  Shehroz S. Khan,et al.  A Survey of Recent Trends in One Class Classification , 2009, AICS.

[18]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[19]  Tomás Pevný,et al.  Loda: Lightweight on-line detector of anomalies , 2016, Machine Learning.

[20]  A. Rényi,et al.  über die konvexe Hülle von n zufÄllig gewÄhlten Punkten. II , 1964 .

[21]  Vishal M. Patel,et al.  Learning Deep Features for One-Class Classification , 2018, IEEE Transactions on Image Processing.

[22]  StallkampJ.,et al.  2012 Special Issue , 2012 .

[23]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[24]  Piotr Indyk,et al.  Nearest-neighbor-preserving embeddings , 2007, TALG.

[25]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[26]  Antoine Geissbühler,et al.  Novelty Detection using One-class Parzen Density Estimator. An Application to Surveillance of Nosocomial Infections , 2008, MIE.

[27]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[28]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[29]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[30]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[31]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[32]  Andreas Dengel,et al.  Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm , 2012 .

[33]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[34]  Yuval Elovici,et al.  Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.

[35]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[36]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.