ASCOS: An Asymmetric network Structure COntext Similarity measure

Discovering similar objects in a social network has many interesting issues. Here, we present ASCOS, an Asymmetric Structure COntext Similarity measure that captures the similarity scores among any pairs of nodes in a network. The definition of ASCOS is similar to that of the well-known SimRank since both define score values recursively. However, we show that ASCOS outputs a more complete similarity score than SimRank because SimRank (and several of its variations, such as P-Rank and SimFusion) on average ignores half paths between nodes during calculation. To make ASCOS tractable in both computation time and memory usage, we propose two variations of ASCOS: a low rank approximation based approach and an iterative solver Gauss-Seidel for linear equations.When the target network is sparse, the run time and the required computing space of these variations are smaller than computing SimRank and ASCOS directly. In addition, the iterative solver divides the original network into several independent sub-systems so that a multi-core server or a distributed computing environment, such as MapReduce, can efficiently solve the problem. We compare the performance of ASCOS with other global structure based similarity measures, including SimRank, Katz, and LHN. The experimental results based on user evaluation suggest that ASCOS gives better results than other measures. In addition, the asymmetric property has the potential to identify the hierarchical structure of a network. Finally, variations of ASCOS (including one distributed variation) can also reduce computation both in space and time.

[1]  Xiaolong Zhang,et al.  Capturing missing edges in social networks using vertex similarity , 2011, K-CAP '11.

[2]  Yinglian Xie,et al.  How user behavior is related to social affinity , 2012, WSDM '12.

[3]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[4]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[5]  William W. Cohen,et al.  Contextual search and name disambiguation in email using graphs , 2006, SIGIR.

[6]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[7]  Jyi-Shane Liu,et al.  Applying Link Prediction to Ranking Candidates for High-Level Government Post , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[8]  Bin Wu,et al.  Link Prediction Based on Local Information , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[9]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[11]  Gene H. Golub,et al.  Matrix computations , 1983 .

[12]  Yizhou Sun,et al.  Fast computation of SimRank for static and dynamic information networks , 2010, EDBT '10.

[13]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[14]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[15]  Edward A. Fox,et al.  SimFusion: measuring similarity using unified relationship matrix , 2005, SIGIR '05.

[16]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[17]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[18]  Xuemin Lin,et al.  SimFusion+: extending simfusion towards efficient estimation on large and dynamic networks , 2012, SIGIR '12.

[19]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[20]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[21]  Hung-Hsuan Chen,et al.  Discovering missing links in networks using vertex similarity measures , 2012, SAC '12.

[22]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[23]  A. Tversky Features of Similarity , 1977 .

[24]  Yizhou Sun,et al.  P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[25]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Mark F. Adams A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers , 2001, SC.

[27]  Yasuhiro Fujiwara,et al.  Efficient personalized pagerank with accuracy assurance , 2012, KDD.

[28]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[29]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[30]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[31]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[32]  Chris H. Q. Ding,et al.  Closed form solution of similarity algorithms , 2010, SIGIR '10.

[33]  Filippo Menczer,et al.  Algorithmic Computation and Approximation of Semantic Similarity , 2006, World Wide Web.

[34]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[35]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[36]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[37]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[38]  J. Cullum,et al.  Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1 , 2002 .