Holistic and scalable ranking of RDF data

The volume and number of data sources published using Semantic Web standards such as RDF grows continuously. The largest of these data sources now contain billions of facts and are updated periodically. A large number of applications driven by such data sources requires the ranking of entities and facts contained in such knowledge graphs. Hence, there is a need for time-efficient approaches that can compute ranks for entities and facts simultaneously. In this paper, we present the first holistic ranking approach for RDF data. Our approach, dubbed HARE, allows the simultaneous computation of ranks for RDF triples, resources, properties and literals. To this end, HARE relies on the representation of RDF graphs as bi-partite graphs. It then employs a time-efficient extension of the random walk paradigm to bi-partite graphs. We show that by virtue of this extension, the worst-case complexity of HARE is O(n5) while that of PageRank is O(n6). In addition, we evaluate the practical efficiency of our approach by comparing it with PageRank on 6 real and 6 synthetic datasets with sizes up to 108 triples. Our results show that HARE is up to 2 orders of magnitude faster than PageRank. We also present a brief evaluation of HARE's ranking accuracy by comparing it with that of PageRank applied directly to RDF graphs. Our evaluation on 19 classes of DBpedia demonstrates that there is no statistical difference between HARE and PageRank. We hence conclude that our approach goes beyond the state of the art by allowing the ranking of all RDF entities and of RDF triples without being worse w.r.t. the ranking quality it achieves on resources. HARE is open-source and is available at http://github.com/dice-group/hare.

[1]  James A. Hendler,et al.  A Method to Rank Nodes in an RDF Graph , 2008, International Semantic Web Conference.

[2]  Aidan Hogan,et al.  ReConRank: A Scalable Ranking Method for Semantic Web Data with Context , 2006 .

[3]  Gerhard Weikum,et al.  EntityAuthority: Semantically Enriched Graph-Based Authority Propagation , 2007, WebDB.

[4]  E. V. Kuliev,et al.  FINDING AND RANKING KNOWLEDGE ON THE SEMANTIC WEB , 2016 .

[5]  Elena Cabrio,et al.  Multilingual Question Answering over Linked Data (QALD-3): Lab Overview , 2013, CLEF.

[6]  Ricardo Usbeck,et al.  Combining Linked Data and Statistical Information Retrieval - Next Generation Information Systems , 2014, ESWC.

[7]  Yun Peng,et al.  Finding and Ranking Knowledge on the Semantic Web , 2005, SEMWEB.

[8]  Xin He,et al.  xhRank: Ranking Entities on the Semantic Web , 2010, ISWC Posters&Demos.

[9]  Abdelghani Bellaachia,et al.  Random Walks in Hypergraph , 2021, International Journal of Education and Information Technologies.

[10]  Yuzhong Qu,et al.  RELIN: Relatedness and Informativeness-Based Centrality for Entity Summarization , 2011, International Semantic Web Conference.

[11]  Muhammad Saleem,et al.  HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation , 2014, ESWC.

[12]  Elena Cabrio,et al.  Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[13]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[14]  Claudio Gutiérrez,et al.  Bipartite Graphs as Intermediate Model for RDF , 2004, SEMWEB.

[15]  Pawan Kumar,et al.  Notice of Violation of IEEE Publication Principles The Anatomy of a Large-Scale Hyper Textual Web Search Engine , 2009 .

[16]  Miguel-Ángel Sicilia,et al.  A survey of approaches for ranking on the web of data , 2014, Information Retrieval.

[17]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[18]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[19]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Roi Blanco,et al.  Effective and Efficient Entity Search in RDF Data , 2011, SEMWEB.

[22]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[23]  Andrea Dessi,et al.  A machine-learning approach to ranking RDF properties , 2016, Future Gener. Comput. Syst..

[24]  Parul Gupta Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines , 2012 .

[25]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[26]  Lynda Hardman,et al.  /facet: A Browser for Heterogeneous Semantic Web Repositories , 2006, SEMWEB.

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Tommaso Di Noia,et al.  Ranking the Linked Data: The Case of DBpedia , 2010, ICWE.

[29]  Muhammad Saleem,et al.  Big linked cancer data: Integrating linked TCGA and PubMed , 2014, J. Web Semant..