ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank

In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture similar node pairs in certain conditions, especially when two nodes can only reach each other through paths of odd lengths. We present new similarity measures ASCOS and ASCOS++ to address the problem. ASCOS outputs a more complete similarity score than SimRank and SimRank’s families. ASCOS++ enriches ASCOS to include edge weight into the measure, giving all edges and network weights an opportunity to make their contribution. We show that both ASCOS++ and ASCOS can be reformulated and applied on a distributed environment for parallel contribution. Experimental results show that ASCOS++ reports a better score than SimRank and several famous similarity measures. Finally, we re-examine previous use cases of SimRank, and explain appropriate and inappropriate use cases. We suggest future SimRank users following the rules proposed here before naïvely applying it. We also discuss the relationship between ASCOS++ and PageRank.

[1]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[3]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[4]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[5]  Xuemin Lin,et al.  SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks , 2012, SIGIR '12.

[6]  S. Berg Snowball Sampling—I , 2006 .

[7]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[8]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[9]  Charalampos E. Tsourakakis Toward Quantifying Vertex Similarity in Networks , 2011, Internet Math..

[10]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[11]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[12]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[13]  Fan Chung Graham,et al.  A Local Graph Partitioning Algorithm Using Heat Kernel Pagerank , 2009, Internet Math..

[14]  Rajmonda Sulo Caceres,et al.  Temporal Scale of Processes in Dynamic Networks , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[15]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[16]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[17]  A. Tversky Features of Similarity , 1977 .

[18]  Philip S. Yu,et al.  On Dynamic Link Inference in Heterogeneous Networks , 2012, SDM.

[19]  Chris H. Q. Ding,et al.  Closed form solution of similarity algorithms , 2010, SIGIR '10.

[20]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[21]  C. Ji An Archetypal Analysis on , 2005 .

[22]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[23]  Hongyan Liu,et al.  Fast Single-Pair SimRank Computation , 2010, SDM.

[24]  Hung-Hsuan Chen,et al.  Predicting Recent Links in FOAF Networks , 2012, SBP.

[25]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[26]  Danai Koutra,et al.  DELTACON: A Principled Massive-Graph Similarity Function , 2013, SDM.

[27]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[28]  Bin Wu,et al.  Link Prediction Based on Local Information , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[29]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[30]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[31]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[32]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[33]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[34]  Nitesh V. Chawla,et al.  Link Prediction: Fair and Effective Evaluation , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[35]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[36]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[37]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[38]  Hung-Hsuan Chen,et al.  The predictive value of young and old links in a social network , 2013, DBSocial '13.

[39]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[40]  Jon Crowcroft,et al.  Parallel iterative solution method for large sparse linear equation systems , 2005 .

[41]  Hung-Hsuan Chen,et al.  ASCOS: An Asymmetric network Structure COntext Similarity measure , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[42]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[43]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[44]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[45]  Shou-De Lin,et al.  Information propagation game: a tool to acquire humanplaying data for multiplayer influence maximization on social networks , 2012, KDD.

[46]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[47]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[48]  Hung-Hsuan Chen,et al.  Discovering missing links in networks using vertex similarity measures , 2012, SAC '12.

[49]  Hung-Hsuan Chen,et al.  Towards the Discovery of Diseases Related by Genes Using Vertex Similarity Measures , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[50]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[51]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[52]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[53]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[54]  Philip S. Yu,et al.  Learning latent friendship propagation networks with interest awareness for link prediction , 2013, SIGIR.

[55]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.