论文信息 - Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

Second-order random walk-based proximity measures in graph analysis: formulations and algorithms

Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk-based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real-life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.

[1] Jing Li,et al. Robust Local Community Detection: On Free Rider Effect and Its Elimination , 2015, Proc. VLDB Endow..

[2] R. Bucklin,et al. Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing , 2009 .

[3] Jure Leskovec,et al. Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[4] David Liben-Nowell,et al. The link-prediction problem for social networks , 2007 .

[5] Georgia Koutrika,et al. A Survey on Proximity Measures for Social Networks , 2012, SeCO Book.

[6] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7] Linyuan Lu,et al. Old and new concentration inequalities , 2006 .

[8] Martin Rosvall,et al. Memory in network flows and its effects on spreading dynamics and community detection , 2013, Nature Communications.

[9] Hong Chen,et al. Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[10] François Fouss,et al. Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11] George Michailidis,et al. Graph-Based Semi-Supervised Learning With Big Data , 2020, Cognitive Analytics.

[12] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[13] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[14] Yizhou Sun,et al. Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[15] Xuemin Lin,et al. Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs , 2010, WAIM.

[16] David F. Gleich,et al. PageRank beyond the Web , 2014, SIAM Rev..

[17] Hinrich Schütze,et al. CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure , 2014, ACL.

[18] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[19] Kevin Chen-Chuan Chang,et al. RoundTripRank: Graph-based proximity with importance and specificity? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20] Yasuhiro Fujiwara,et al. Efficient search algorithm for SimRank , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[21] Gang Chen,et al. Evaluating geo-social influence in location-based social networks , 2012, CIKM.

[22] Carl D. Meyer,et al. Deeper Inside PageRank , 2004, Internet Math..

[23] Ken-ichi Kawarabayashi,et al. Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[24] Dániel Fogaras,et al. Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[25] Partha Pratim Talukdar,et al. Graph-Based Semi-Supervised Learning , 2014, Graph-Based Semi-Supervised Learning.

[26] Kyomin Jung,et al. LinkSCAN*: Overlapping community detection using the link-space transformation , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[27] Ruoming Jin,et al. Efficient and Exact Local Search for Random Walk Based Top-K Proximity Query in Large Graphs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28] Leo Katz,et al. A new status index derived from sociometric analysis , 1953 .

[29] Jon Kleinberg,et al. The link prediction problem for social networks , 2003, CIKM '03.

[30] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[31] Jian Pei,et al. More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[32] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[33] Christian Bizer,et al. Graph structure in the web: aggregated by pay-level domain , 2014, WebSci '14.

[34] A. Raftery. A model for high-order Markov chains , 1985 .

[35] Christos Faloutsos,et al. Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[36] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[37] Purnamrita Sarkar,et al. A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[38] Fan Chung Graham,et al. Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[39] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .

[40] Linyuan Lu,et al. Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[41] Ken-ichi Kawarabayashi,et al. Efficient SimRank Computation via Linearization , 2014, ArXiv.

[42] Jure Leskovec,et al. Higher-order organization of complex networks , 2016, Science.

[43] Purnamrita Sarkar,et al. Fast nearest-neighbor search in disk-resident graphs , 2010, KDD.

[44] Xiang Zhang,et al. Remember Where You Came From: On The Second-Order Random Walk Based Proximity Measures , 2016, Proc. VLDB Endow..

[45] Ruoming Jin,et al. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs , 2014, SIGMOD Conference.

[46] Dániel Fogaras,et al. Scaling link-based similarity search , 2005, WWW '05.

[47] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[48] David F. Gleich,et al. Multilinear PageRank , 2014, SIAM J. Matrix Anal. Appl..

[49] F. Radicchi,et al. Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50] Kenneth Ward Church,et al. Query suggestion using hitting time , 2008, CIKM '08.

[51] James Hendler,et al. Google’s PageRank and Beyond: The Science of Search Engine Rankings , 2007 .