Fast low-rank matrix approximation with locality sensitive hashing for quick anomaly detection

Detecting anomalous traffic is a critical task for advanced Internet management. The traditional approaches based on Principal Component Analysis (PCA) are effective only when the corruption is caused by small additive i.i.d. Gaussian noise. The recent Direct Robust Matrix Factorization (DRMF) is proven to be more robust and accurate in anomaly detection, but it incurs a high computation cost due to its need of singular value decomposition (SVD) for low-rank matrix approximation and the iterative use of SVD execution to find the final solution. To enable the anomaly detection for large traffic matrix with the use of DRMF, we formulate the low-rank matrix approximation problem as a problem of searching for the subspace to project the traffic matrix with the minimum error. We propose a novel approach, LSH-subspace, for fast low-rank matrix approximation. To facilitate the matrix partition for the quick search of the subspace, we propose several novel techniques: a multi-layer locality sensitive hashing (LSH) table to reorder the OD pairs based on LSH function, a partition principle to guide the partition to minimize the projection error, and a lightweight algorithm to exploit the sparsity of the outlier matrix to update the LSH table at low overhead. Our extensive simulations based on real trace data demonstrate that our LSH-subspace is 3 times faster than DRMF with high anomaly detection accuracy.

[1]  Ling Huang,et al.  Communication-Efficient Online Detection of Network-Wide Anomalies , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[2]  Xi Chen,et al.  Direct Robust Matrix Factorizatoin for Anomaly Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Ramesh Govindan,et al.  MIND: A Distributed Multi-Dimensional Indexing System for Network Diagnosis , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[4]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[5]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[6]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[7]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Christian Callegari,et al.  A Novel PCA-Based Network Anomaly Detection , 2011, 2011 IEEE International Conference on Communications (ICC).

[9]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[10]  Marina Thottan,et al.  Anomaly detection in IP networks , 2003, IEEE Trans. Signal Process..

[11]  Gaogang Xie,et al.  Accurate recovery of Internet traffic data: A tensor completion approach , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[12]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.

[13]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[14]  Anukool Lakhina,et al.  Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[15]  Zhaosong Lu,et al.  Penalty Decomposition Methods for $L0$-Norm Minimization , 2010, ArXiv.

[16]  Gaogang Xie,et al.  Accurate recovery of internet traffic data under dynamic measurements , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[17]  Gaogang Xie,et al.  Sequential and adaptive sampling for matrix completion in network monitoring systems , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[18]  Ramesh Govindan,et al.  Detection and identification of network anomalies using sketch subspaces , 2006, IMC '06.

[19]  Martin May,et al.  Applying PCA for Traffic Anomaly Detection: Problems and Solutions , 2009, IEEE INFOCOM 2009.

[20]  Yong Guan,et al.  Sketch-Based Streaming PCA Algorithm for Network-Wide Traffic Anomaly Detection , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[21]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[22]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[23]  Jiannong Cao,et al.  Recover Corrupted Data in Sensor Networks: A Matrix Completion Solution , 2017, IEEE Transactions on Mobile Computing.

[24]  Mark Crovella,et al.  Characterization of network-wide anomalies in traffic flows , 2004, IMC '04.

[25]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[26]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[27]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[28]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.