A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs

Recently there has been much interest in graph-based learning, with applications in collaborative filtering for recommender networks, link prediction for social networks and fraud detection. These networks can consist of millions of entities, and so it is very important to develop highly efficient techniques. We are especially interested in accelerating random walk approaches to compute some very interesting proximity measures of these kinds of graphs. These measures have been shown to do well empirically (Liben-Nowell & Kleinberg, 2003; Brand, 2005). We introduce a truncated variation on a well-known measure, namely commute times arising from random walks on graphs. We present a very novel algorithm to compute all interesting pairs of approximate nearest neighbors in truncated commute times, without computing it between all pairs. We show results on both simulated and real graphs of size up to 100; 000 entities, which indicate near-linear scaling in computation time.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[3]  Peter G. Doyle,et al.  Random walks and electric networks , 1987, math/0001057.

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  O. Haggstrom Reversible Markov chains , 2002 .

[6]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[7]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[8]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[9]  Robert Krauthgamer,et al.  The intrinsic dimensionality of graphs , 2003, STOC '03.

[10]  E. Schwartz,et al.  Isoperimetric Graph Partitioning for Data Clustering and Image Segmentation , 2003 .

[11]  R. Basri,et al.  Shape representation and classification using the Poisson equation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[13]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[14]  Clustering Using a Random Walk Based Distance Measure , 2005, ESANN.

[15]  Edwin R. Hancock,et al.  Image Segmentation using Commute Times , 2005, BMVC.

[16]  Prabhakar Raghavan,et al.  The electrical resistance of a graph captures its commute and cover times , 2005, computational complexity.

[17]  Edwin R. Hancock,et al.  Robust Multi-body Motion Tracking Using Commute Time Clustering , 2006, ECCV.

[18]  Leo Grady,et al.  Isoperimetric Partitioning: A New Algorithm for Graph Partitioning , 2005, SIAM J. Sci. Comput..

[19]  Leo Grady,et al.  Isoperimetric graph partitioning for image segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[21]  L. Asz Random Walks on Graphs: a Survey , 2022 .