Random Walks in Social Networks and their Applications: A Survey

A wide variety of interesting real world applications, e.g. friend suggestion in social networks, keyword search in databases, web-spam detection etc. can be framed as ranking entities in a graph. In order to obtain ranking we need a graph-theoretic measure of similarity. Ideally this should capture the information hidden in the graph structure. For example, two entities are similar, if there are lots of short paths between them. Random walks have proven to be a simple, yet powerful mathematical tool for extracting information from the ensemble of paths between entities in a graph. Since real world graphs are enormous and complex, ranking using random walks is still an active area of research. The research in this area spans from new applications to novel algorithms and mathematical analysis, bringing together ideas from different branches of statistics, mathematics and computer science. In this book chapter, we describe different random walk based proximity measures, their applications, and existing algorithms for computing them.

[1]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[2]  John E. Hopcroft,et al.  Manipulation-Resistant Reputations Using Hitting Time , 2007, WAW.

[3]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[4]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  D. Aldous,et al.  Chapter 3 Reversible Markov Chains , 1994 .

[7]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[8]  Prabhakar Raghavan,et al.  The electrical resistance of a graph captures its commute and cover times , 2005, computational complexity.

[9]  Jon M. Kleinberg,et al.  Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Sreenivas Gollapudi,et al.  Less is more: sampling the neighborhood graph makes SALSA better and faster , 2009, WSDM '09.

[13]  Zoubin Ghahramani,et al.  A new approach to data driven clustering , 2006, ICML.

[14]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[15]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[16]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[17]  Christos Faloutsos,et al.  Fast direction-aware proximity for graph mining , 2007, KDD '07.

[18]  David Harel,et al.  On Clustering Using Random Walks , 2001, FSTTCS.

[19]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[20]  Purnamrita Sarkar,et al.  A Tractable Approach to Finding Closest Truncated-commute-time Neighbors in Large Graphs , 2007, UAI.

[21]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[23]  BasriRonen,et al.  Shape Representation and Classification Using the Poisson Equation , 2006 .

[24]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[25]  Soumen Chakrabarti,et al.  SPIN: searching personal information networks , 2005, SIGIR '05.

[26]  John A. Tomlin,et al.  A new paradigm for ranking pages on the world wide web , 2003, WWW '03.

[27]  C. R. Rao,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[28]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[30]  Ah Chung Tsoi,et al.  Adaptive ranking of web pages , 2003, WWW '03.

[31]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[32]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[33]  Leo Grady,et al.  Isoperimetric graph partitioning for image segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[35]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[36]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[37]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[38]  William W. Cohen,et al.  Contextual search and name disambiguation in email using graphs , 2006, SIGIR.

[39]  Arik Azran,et al.  The rendezvous algorithm: multiclass semi-supervised learning with Markov random walks , 2007, ICML '07.

[40]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[41]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[42]  Alain Pirotte,et al.  A novel way of computing dissimilarities between nodes of a graph , 2004 .

[43]  Matthew Brand,et al.  A Random Walks Perspective on Maximizing Satisfaction and Profit , 2005, SDM.

[44]  Peter G. Doyle,et al.  Random walks and electric networks , 1987, math/0001057.

[45]  Fan Chung Graham,et al.  Local Partitioning for Directed Graphs Using PageRank , 2007, Internet Math..

[46]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[47]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[48]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[49]  Zhi-Li Zhang,et al.  Commute Times for a Directed Graph using an Asymmetric Laplacian , 2011 .

[50]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[51]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds, and Experiments , 2005, Internet Math..

[52]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[53]  Baoning Wu,et al.  Extracting link spam using biased random walks from spam seed sets , 2007, AIRWeb '07.

[54]  Panayiotis Tsaparas,et al.  Using non-linear dynamical systems for web searching and ranking , 2004, PODS.

[55]  Purnamrita Sarkar,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Graph Algorithms Fast Dynamic Reranking in Large Graphs , 2022 .

[56]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[57]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[58]  Marc Najork,et al.  Efficient and effective link analysis with precomputed salsa maps , 2008, CIKM '08.

[59]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[60]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[61]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[62]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[63]  Purnamrita Sarkar,et al.  Fast nearest-neighbor search in disk-resident graphs , 2010, KDD.

[64]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[65]  R. Basri,et al.  Shape representation and classification using the Poisson equation , 2004, CVPR 2004.

[66]  Purnamrita Sarkar,et al.  Fast incremental proximity search in large graphs , 2008, ICML '08.

[67]  William W. Cohen,et al.  A Graphical Framework for Contextual Search and Name Disambiguation in Email , 2006, SIGIR 2006.

[68]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[69]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[70]  Edwin R. Hancock,et al.  Robust Multi-body Motion Tracking Using Commute Time Clustering , 2006, ECCV.

[71]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[72]  András A. Benczúr,et al.  To randomize or not to randomize: space optimal summaries for hyperlink analysis , 2006, WWW '06.

[73]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[74]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[75]  Dani Lischinski,et al.  Colorization using optimization , 2004, SIGGRAPH 2004.

[76]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[77]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[78]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[79]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[81]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[82]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[83]  E. Schwartz,et al.  Isoperimetric Graph Partitioning for Data Clustering and Image Segmentation , 2003 .

[84]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[85]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[86]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[87]  Ravi Kumar,et al.  Anchor-based proximity measures , 2007, WWW '07.

[88]  Edwin R. Hancock,et al.  Image Segmentation using Commute Times , 2005, BMVC.

[89]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.