SimRank: A Page Rank approach based on similarity measure

As the Web contains rich and convenient information, Web search engine is increasingly becoming the dominant information retrieving approach. In order to rank the query results of web pages in an effective and efficient fashion, we propose a new page rank algorithm based on similarity measure from the vector space model, called SimRank, to score web pages. Firstly, we propose a new similarity measure to compute the similarity of pages and apply it to partition a web database into several web social networks (WSNs). Secondly, we improve the traditional PageRank algorithm by taking into account the relevance of page to a given query. Thirdly, we design an efficient web crawler to download the web data. And finally, we perform experimental studies to evaluate the time efficiency and scoring accuracy of SimRank with other approaches.

[1]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[2]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[3]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Wei Liu,et al.  Mining Key Members of Crime Networks Based on Personality Trait Simulation Email Analysis System: Mining Key Members of Crime Networks Based on Personality Trait Simulation Email Analysis System , 2009 .

[5]  Hsinchun Chen,et al.  Analyzing Terrorist Networks: A Case Study of the Global Salafi Jihad Network , 2005, ISI.

[6]  Shaojie Qiao,et al.  WebRank: A Hybrid Page Scoring Approach Based on Social Network Analysis , 2010, RSKT.

[7]  Hsinchun Chen,et al.  CrimeNet explorer: a framework for criminal network knowledge discovery , 2005, TOIS.

[8]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[9]  Qiao Shao Mining Key Members of Crime Networks Based on Personality Trait Simulation Email Analysis System , 2008 .

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Qi Huachun PageRank Algorithm Research , 2006 .

[12]  Zhang Ling Accelerated Ranking: A New Method to Improve Web Structure Mining Quality , 2004 .

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[16]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.