Ranking on Very Large Knowledge Graphs

Ranking plays a central role in a large number of applications driven by RDF knowledge graphs. Over the last years, many popular RDF knowledge graphs have grown so large that rankings for the facts they contain cannot be computed directly using the currently common 64-bit platforms. In this paper, we tackle two problems: Computing ranks on such large knowledge bases efficiently and incrementally. First, we present ðare, a distributed approach for computing ranks on very large knowledge graphs. ðare assumes the random surfer model and relies on data partitioning to compute matrix multiplications and transpositions on disk for matrices of arbitrary size. Moreover, the data partitioning underlying ðare allows the execution of most of its steps in parallel. As very large knowledge graphs are often updated periodically, we tackle the incremental computation of ranks on large knowledge bases as a second problem. We address this problem by presenting \ihare, an approximation technique for calculating the overall ranking scores of a knowledge without the need to recalculate the ranking from scratch at each new revision. We evaluate our approaches by calculating ranks on the $3 \times 10^9$ and $2.4 \times 10^9$ triples from Wikidata resp. LinkedGeoData. Our evaluation demonstrates that ðare is the first holistic approach for computing ranks on very large RDF knowledge graphs. In addition, our incremental approach achieves a root mean squared error of less than $10^-7 $ in the best case. Both ðare and \ihare are open-source and are available at: \urlhttps://github.com/dice-group/incrementalHARE.

[1]  Yun Peng,et al.  Finding and Ranking Knowledge on the Semantic Web , 2005, SEMWEB.

[2]  Xin He,et al.  xhRank: Ranking Entities on the Semantic Web , 2010, ISWC Posters&Demos.

[3]  James A. Hendler,et al.  A Method to Rank Nodes in an RDF Graph , 2008, International Semantic Web Conference.

[4]  Tommaso Di Noia,et al.  Ranking the Linked Data: The Case of DBpedia , 2010, ICWE.

[5]  Roi Blanco,et al.  Effective and Efficient Entity Search in RDF Data , 2011, SEMWEB.

[6]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[7]  Claudio Gutiérrez,et al.  Bipartite Graphs as Intermediate Model for RDF , 2004, SEMWEB.

[8]  Enrico Motta,et al.  Evaluating question answering over linked data , 2013, J. Web Semant..

[9]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[10]  Miguel-Ángel Sicilia,et al.  A survey of approaches for ranking on the web of data , 2014, Information Retrieval.

[11]  Gerhard Weikum,et al.  EntityAuthority: Semantically Enriched Graph-Based Authority Propagation , 2007, WebDB.

[12]  Jens Lehmann,et al.  LinkedGeoData: A core for a web of spatial open data , 2012, Semantic Web.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Andrea Dessi,et al.  A machine-learning approach to ranking RDF properties , 2016, Future Gener. Comput. Syst..

[15]  Yuzhong Qu,et al.  RELIN: Relatedness and Informativeness-Based Centrality for Entity Summarization , 2011, International Semantic Web Conference.

[16]  Ricardo Usbeck,et al.  Combining Linked Data and Statistical Information Retrieval - Next Generation Information Systems , 2014, ESWC.

[17]  Aidan Hogan,et al.  ReConRank: A Scalable Ranking Method for Semantic Web Data with Context , 2006 .

[18]  Abdelghani Bellaachia,et al.  Random Walks in Hypergraph , 2021, International Journal of Education and Information Technologies.

[19]  Axel-Cyrille Ngonga Ngomo,et al.  Holistic and scalable ranking of RDF data , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[20]  Elena Cabrio,et al.  Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[21]  Parul Gupta Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines , 2012 .

[22]  Sandro Rautenberg,et al.  DBtrends: Exploring Query Logs for Ranking RDF Data , 2016, SEMANTiCS.

[23]  Jens Lehmann,et al.  DBtrends: Publishing and Benchmarking RDF Ranking Functions , 2016, SumPre@ESWC.

[24]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[25]  Dennis Diefenbach,et al.  PageRank and Generic Entity Summarization for RDF Knowledge Bases , 2018, ESWC.