PageRank for Billion-Scale Networks in RDBMS

Data processing for Big Data plays a vital role for decision-makers in organizations and government, enhances the user experience, and provides quality results in prediction analysis. However, many modern data processing solutions make a significant investment in hardware and maintenance costs, such as Hadoop and Spark, often neglecting the well established and widely used relational database management systems (RDBMS’s). PageRank is vital in Google Search and social networks to determine how to sort search results and how influential a person is in a social group. PageRank is an iterative algorithm which imposes challenges when implementing it over large graphs which are becoming the norm with the current volume of data processed everyday from social networks, IOT, and web content. In this paper we study computing PageRank using RDBMS for very large graphs using a consumer-grade server and compare the results to a dedicated graph database .

[1]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[2]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[3]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[4]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[5]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[8]  Carlos Ordonez,et al.  Efficient disk-based K-means clustering for relational databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[10]  Reynold Xin,et al.  Apache Spark , 2016 .

[11]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[12]  Lothar Richter Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. Mining of Massive Datasets. Cambridge, Cambridge University Press. , 2018 .

[13]  Krishna Bharat,et al.  When experts agree: using non-affiliated experts to rank popular topics , 2001, TOIS.

[14]  Alex Thomo,et al.  Computing source-to-target shortest paths for complex networks in RDBMS , 2017, J. Comput. Syst. Sci..

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Jeffrey Xu Yu,et al.  Shortest Path Computing in Relational DBMSs , 2014, IEEE Transactions on Knowledge and Data Engineering.