An efficient page ranking approach based on vector norms using sNorm(p) algorithm

Abstract In the whole world, the internet is exercised by millions of people every day for information retrieval. Even for a small to smaller task like fixing a fan, to cook food or even to iron clothes persons opt to search the web. To fulfill the information needs of people, there are billions of web pages, each having a different degree of relevance to the topic of interest (TOI), scattered throughout the web but this huge size makes manual information retrieval impossible. The page ranking algorithm is an integral part of search engines as it arranges web pages associated with a queried TOI in order of their relevance level. It, therefore, plays an important role in regulating the search quality and user experience for information retrieval. PageRank, HITS, and SALSA are well-known page ranking algorithm based on link structure analysis of a seed set, but ranking given by them has not yet been efficient. In this paper, we propose a variant of SALSA to give sNorm(p) for the efficient ranking of web pages. Our approach relies on a p-Norm from Vector Norm family in a novel way for the ranking of web pages as Vector Norms can reduce the impact of low authority weight in hub weight calculation in an efficient way. Our study, then compares the rankings given by PageRank, HITS, SALSA, and sNorm(p) to the same pages in the same query. The effectiveness of the proposed approach over state of the art methods has been shown using performance measurement technique, Mean Reciprocal Rank (MRR), Precision, Mean Average Precision (MAP), Discounted Cumulative Gain (DCG) and Normalized DCG (NDCG). The experimentation is performed on a dataset acquired after pre-processing of the results collected from initial few pages retrieved for a query by the Google search engine. Based on the type and amount of in-hand domain expertise 30 queries are designed. The extensive evaluation and result analysis are performed using MRR, Precision@k, MAP, DCG, and NDCG as the performance measuring statistical metrics. Furthermore, results are statistically verified using a significance test. Findings show that our approach outperforms state of the art methods by attaining 0.8666 as MRR value, 0.7957 as MAP value. Thus contributing to the improvement in the ranking of web pages more efficiently as compared to its counterparts.

[1]  Reem Bahgat,et al.  PREFCA: A portal retrieval engine based on formal concept analysis , 2017, Inf. Process. Manag..

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Haluk Bingol,et al.  Context sensitive article ranking with citation context analysis , 2015, Scientometrics.

[4]  Anukrati Sharma,et al.  Knowledge Extraction Through Page Rank Using Web-Mining Techniques for E-Business: A Review , 2017 .

[5]  Ashutosh Dixit,et al.  A novel user preference and feedback based Page Ranking technique , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[6]  Jianqiang Li,et al.  A path-based approach for web page retrieval , 2011, World Wide Web.

[7]  Jingjing Liu,et al.  Exploring search task difficulty reasons in different task types and user knowledge groups , 2015, Inf. Process. Manag..

[8]  Dilip Kumar Sharma,et al.  Enhanced-RatioRank: Enhancing impact of inlinks and outlinks , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[9]  Ashutosh Kumar Singh,et al.  Review of Link Structure Based Ranking Algorithms and Hanging Pages , 2016 .

[10]  Bernard J. Jansen,et al.  The effectiveness of Web search engines for retrieving relevant ecommerce links , 2006, Inf. Process. Manag..

[11]  Amanda Spink,et al.  A study of results overlap and uniqueness among major Web search engines , 2006, Inf. Process. Manag..

[12]  Ioannis Pitas,et al.  Web Search Based on Ranking , 2016 .

[13]  C. D. Meyer,et al.  Who's #1?: The Science of Rating and Ranking , 2012 .

[14]  Wenjun Yang,et al.  An improved Pagerank algorithm based on time feedback and topic similarity , 2016, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[15]  Anirban Kundu,et al.  Introducing Link Based Weightage for Web Page Ranking , 2015, Int. J. Artif. Life Res..

[16]  Achim Rettinger,et al.  PageRank on Wikipedia: Towards General Importance Scores for Entities , 2016, @ESWC.

[17]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[18]  James R. Schott,et al.  Matrix Analysis for Statistics , 2005 .

[19]  Alston S. Householder,et al.  The Theory of Matrices in Numerical Analysis , 1964 .

[20]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[21]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[22]  Marc Najork,et al.  Hits on the web: how does it compare? , 2007, SIGIR.

[23]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[24]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[25]  Ting-Zhu Huang,et al.  An efficient elimination strategy for solving PageRank problems , 2017, Appl. Math. Comput..

[26]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[27]  Ricardo Campos,et al.  GTE-Rank: A time-aware search engine to answer time-sensitive queries , 2016, Inf. Process. Manag..

[28]  Soo Young Rieh,et al.  Cuisine: Classification using stylistic feature sets and-or name-based feature sets , 2010 .

[29]  Takashi Toriu,et al.  A New Look into Web Page Ranking Systems , 2014, ICGEC.

[30]  Weiming Yang An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity , 2016, 2016 International Conference on Cyberworlds (CW).

[31]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.