A comprehensive structural-based similarity measure in directed graphs

Computing similarity between two nodes in directed graphs plays an increasingly important role in various research fields, including clustering, collaborative filtering and community mining. Many similarity measures have been devoted in recent years, such as SimRank, PSimRank and SimFusion. However, these measures consider only the expected meeting probability of equal path length, which may omit some latent similar nodes. Besides, the link importance of each edge is not distinguished, which may lead to unreasonable rankings while searching similar nodes. In this paper, we propose an effective structural-based similarity measure, ESimRank, for effectively computing similarities in directed graphs. We firstly define effective relationship strength (ERS) to distinguish link importance by utilizing node activity, node attraction and link frequency. And then we formalize ESimRank equation by combining ERS and the expected meeting probabilities of any path length. Compared to existing similarity measures, ESimRank can find more latent similar nodes and give ranking of better quality. For supporting fast similarity computation, we develop an extended partial sums-based algorithm, which reduces the time complexity significantly. Extensive experiments demonstrate the effectiveness and efficiency of ESimRank by comparing with the state-of-the-art similarity measures.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[3]  Philip S. Yu,et al.  Community detection in incomplete information networks , 2012, WWW.

[4]  Khairullah Khan,et al.  Semantic-Based Unsupervised Hybrid Technique for Opinion Targets Extraction from Unstructured Reviews , 2014 .

[5]  Michael R. Lyu,et al.  Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[6]  Sharma Chakravarthy,et al.  Pairwise Similarity Calculation of Information Networks , 2011, DaWaK.

[7]  Michael Moricz,et al.  PYMK: friend recommendation at myspace , 2010, SIGMOD Conference.

[8]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[9]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[10]  Phichit Kajondecha,et al.  Behavior Patterns of Information Discovery in Social Bookmarking Service , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[11]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[12]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[13]  Qing Liu,et al.  A Partition-Based Approach to Structure Similarity Search , 2013, Proc. VLDB Endow..

[14]  Xuemin Lin,et al.  IRWR: incremental random walk with restart , 2013, SIGIR.

[15]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[16]  Michael R. Lyu,et al.  PageSim: A Novel Link-Based Similarity Measure for the World Wide Web , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[17]  Michael R. Lyu,et al.  MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching , 2009, CIKM.

[18]  Hongyan Liu,et al.  Exploiting the Block Structure of Link Graph for Efficient Similarity Computation , 2009, PAKDD.

[19]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[20]  Hesham Abusaimeh,et al.  Balancing the Network Clusters for the Lifetime Enhancement in Dense Wireless Sensor Networks , 2014 .

[21]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[22]  Philip S. Yu,et al.  LinkClus: efficient clustering via heterogeneous semantic links , 2006, VLDB.

[23]  Ido Guy,et al.  Personalized recommendation of social software items based on social relations , 2009, RecSys '09.

[24]  Pavel Velikhov,et al.  Accuracy estimate and optimization techniques for SimRank computation , 2008, Proc. VLDB Endow..

[25]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[26]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[27]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[28]  Hao Hu,et al.  E-rank: A Structural-Based Similarity Measure in Social Networks , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[29]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[30]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[31]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[32]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[33]  David Sánchez,et al.  A semantic similarity method based on information content exploiting multiple ontologies , 2013, Expert Syst. Appl..

[34]  PeiJian,et al.  More is simpler , 2013, VLDB 2013.

[35]  Wei Wang,et al.  Top-k similarity search in heterogeneous information networks with x-star network schema , 2015, Expert Syst. Appl..

[36]  Ido Guy,et al.  Personalized social search based on the user's social network , 2009, CIKM.

[37]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[38]  Chris H. Q. Ding,et al.  Closed form solution of similarity algorithms , 2010, SIGIR '10.

[39]  Laks V. S. Lakshmanan,et al.  On Top-k Structural Similarity Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[40]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.