Incremental commute time and its online applications

Abstract Commute time is a robust measure on graphs based on random walks. It has been successfully applied in many application domains including personalized search, collaborative filtering and network intrusion detection. However, the computation of the commute time is expensive since it involves the eigen decomposition of the graph Laplacian matrix. There has been effort to approximate the commute time but they only work in an offline mode. In this work, an accurate and efficient approximation for computing the commute time is proposed in an incremental fashion in order to facilitate online applications. Using the incremental commutime, we design an online anomaly detection application where the commute time of each new arriving data point to any point in the current graph can be estimated in constant time. The proposed approach shows its high accuracy and efficiency in synthetic and real datasets for online applications. It takes only 8 milliseconds on average to detect anomalies online on the DBLP graph which has more than 600,000 nodes and 2 millions edges. We also discuss the use of incremental commute time for other online applications such as classification, graph ranking and clustering.

[1]  Suresh Venkatasubramanian,et al.  The Johnson-Lindenstrauss Transform: An Empirical Study , 2011, ALENEX.

[2]  Gary L. Miller,et al.  A Nearly-m log n Time Solver for SDD Linear Systems , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[3]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[4]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[5]  Nguyen Lu Dang Khoa,et al.  Robust Outlier Detection Using Commute Time and Eigenspace Embedding , 2010, PAKDD.

[6]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[7]  Edwin R. Hancock,et al.  Graph simplification and matching using commute times , 2007, Pattern Recognit..

[8]  Nguyen Lu Dang Khoa,et al.  Incremental Commute Time Using Random Walks and Online Anomaly Detection , 2016, ECML/PKDD.

[9]  Benoît Champagne,et al.  Adaptive eigendecomposition of data covariance matrices based on first-order perturbations , 1994, IEEE Trans. Signal Process..

[10]  Nguyen Lu Dang Khoa,et al.  A scalable approach to spectral clustering with SDD solvers , 2013, Journal of Intelligent Information Systems.

[11]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[12]  S. Eisenstat,et al.  A Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem , 1994, SIAM J. Matrix Anal. Appl..

[13]  Ben Y. Zhao,et al.  On the Embeddability of Random Walk Distances , 2013, Proc. VLDB Endow..

[14]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[15]  Chin-Chun Chang,et al.  Active learning based on minimization of the expected path-length of random walks on the learned manifold structure , 2017, Pattern Recognit..

[16]  Edwin R. Hancock,et al.  Clustering and Embedding Using Commute Times , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Nguyen Lu Dang Khoa,et al.  Network Anomaly Detection Using a Commute Distance Based Approach , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[18]  Gary L. Miller,et al.  Combinatorial preconditioners and multilevel solvers for problems in computer vision and image processing , 2009, Comput. Vis. Image Underst..

[19]  Maxim Sviridenko,et al.  An Algorithm for Online K-Means Clustering , 2014, ALENEX.

[20]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[21]  Edwin R. Hancock,et al.  Image Segmentation using Commute Times , 2005, BMVC.

[22]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[23]  Purnamrita Sarkar,et al.  A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[24]  John E. Hopcroft,et al.  Manipulation-Resistant Reputations Using Hitting Time , 2007, WAW.

[25]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[26]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[27]  Gene H. Golub,et al.  Some modified matrix eigenvalue problems , 1973, Milestones in Matrix Computation.

[28]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[29]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[30]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[31]  R. K. Agrawal,et al.  Perturbation scheme for online learning of features: Incremental principal component analysis , 2008, Pattern Recognit..

[32]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..