SimRank and its variants in academic literature data: measures and evaluation

SimRank is a well-known link-based similarity measure that can be applied on a citation graph to compute similarity of academic literature data. The intuition behind SimRank is that two objects are similar if they are referenced by similar objects. SimRank has attracted a growing interest in the areas of data mining and information retrieval recently. Despite of the current success of SimRank, it has some problems that negatively affect its effectiveness in similarity computation. In this paper, we discuss the three existing problems of SimRank, present SimRank variants that have been proposed to solve those problems, and evaluate the effectiveness of SimRank and its variants in similarity computation for academic literature data by conducting extensive experiments on a real-world dataset.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[4]  K. Fujita,et al.  Detecting research fronts using different types of weighted citation networks , 2012, 2012 Proceedings of PICMET '12: Technology Management for Emerging Technologies.

[5]  Walid Magdy,et al.  PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[7]  Lei Zou,et al.  Efficient SimRank-based Similarity Join Over Large Graphs , 2013, Proc. VLDB Endow..

[8]  Hongyan Liu,et al.  S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently , 2008, ADMA.

[9]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2008, WWW.

[12]  Yasuhiro Fujiwara,et al.  Efficient search algorithm for SimRank , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[14]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[15]  Julie A. McCann,et al.  Efficient Partial-Pairs SimRank Search for Large Networks , 2015, Proc. VLDB Endow..

[16]  Sunju Park,et al.  C-Rank: A link-based similarity measure for scientific literature databases , 2011, Inf. Sci..

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Dong-Jin Kim,et al.  On exploiting content and citations together to compute similarity of scientific papers , 2013, CIKM.

[19]  Dániel Fogaras,et al.  Scaling link-based similarity search , 2005, WWW '05.

[20]  Seok-Ho Yoon,et al.  Link-Based Similarity Measures Using Reachability Vectors , 2014, TheScientificWorldJournal.

[21]  Michael R. Lyu,et al.  MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching , 2009, CIKM.

[22]  Ken-ichi Kawarabayashi,et al.  Scalable similarity search for SimRank , 2014, SIGMOD Conference.

[23]  Sunju Park,et al.  A link-based similarity measure for scientific literature , 2010, WWW '10.

[24]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[25]  Xuemin Lin,et al.  A space and time efficient algorithm for SimRank computation , 2010, 2010 12th International Asia-Pacific Web Conference.

[26]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..