Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner use case

Graph-based modelling is becoming more popular, in the sciences and elsewhere, as a flexible and powerful way to exploit data to power world-changing digital applications. Compared to the initial vision of the Semantic Web, knowledge graphs and graph databases are becoming a practical and computationally less formal way to manage graph data. On the other hand, linked data based on Semantic Web standards are a complementary, rather than alternative, approach to deal with these data, since they still provide a common way to represent and exchange information. In this paper we introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomicsrelated real use cases, we show how such mapping can allow for a hybrid approach to the management of networked knowledge, based on taking advantage of the best of both RDF and property graphs.

[1]  Andrew M. Jenkinson,et al.  The EBI RDF platform: linked open data for the life sciences , 2014, Bioinform..

[2]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[3]  Christopher J. Rawlings,et al.  Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach , 2018, J. Integr. Bioinform..

[4]  Keywan Hassani-Pak,et al.  KnetMiner - An integrated data platform for gene mining and biological knowledge discovery , 2017 .

[5]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[6]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[7]  T. R. Gopalakrishnan Nair,et al.  Evolution of Knowledge Representation and Retrieval Techniques , 2015 .

[8]  Oliver Horlacher,et al.  Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search , 2015, PloS one.

[9]  Paul T. Groth,et al.  On the formulation of performant SPARQL queries , 2013, J. Web Semant..

[10]  Thorsten Liebig,et al.  GraphScale: Adding Expressive Reasoning to Semantic Data Stores , 2015, International Semantic Web Conference.

[11]  Stefan Rümmele,et al.  Benchmarking Database Systems for Graph Pattern Matching , 2014, DEXA.

[12]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[13]  Aleksa Vukotic,et al.  Neo4j in Action , 2014 .

[14]  Bioschemas Community Bioschemas: From Potato Salad to Protein Annotation , 2017, ISWC 2017.

[15]  Christopher Menzel,et al.  Reference Ontologies - Application Ontologies: Either/Or or Both/And? , 2003, KI Workshop on Reference Ontologies and Application Ontologies.

[16]  Ian Horrocks,et al.  Description Logics , 2008, Handbook of Knowledge Representation.

[17]  Xiaoshu Wang,et al.  From XML to RDF: how semantic web technologies will change the design of 'omic' standards , 2005, Nature Biotechnology.

[18]  Roberto De Virgilio,et al.  Smart RDF Data Storage in Graph Databases , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[19]  Pablo Pareja-Tobes,et al.  Bio4j: a high-performance cloud-enabled graph-based data platform , 2015, bioRxiv.

[20]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[21]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[22]  Jesse Weaver,et al.  Facebook Linked Data via the Graph API , 2013, Semantic Web.

[23]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[24]  René Peinl,et al.  Querying a graph database - language selection and performance considerations , 2016, J. Comput. Syst. Sci..

[25]  Hongyan Wu,et al.  BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data , 2014, J. Biomed. Semant..

[26]  Jens Lehmann,et al.  Killing Two Birds with One Stone - Querying Property Graphs using SPARQL via GREMLINATOR , 2018, ArXiv.

[27]  Kei-Hoi Cheung,et al.  Advancing translational research with the Semantic Web , 2007, BMC Bioinformatics.

[28]  Andrey Gubichev,et al.  Graph Pattern Matching: Do We Have to Reinvent the Wheel? , 2014, GRADES.

[29]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[30]  Marko A. Rodriguez,et al.  The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[31]  Keywan Hassani-Pak,et al.  Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes , 2017, J. Integr. Bioinform..

[32]  Peter A. Boncz,et al.  Deriving an Emergent Relational Schema from RDF Data , 2015, WWW.

[33]  Christopher J. Rawlings,et al.  Representing and querying disease networks using graph databases , 2016, BioData Mining.

[34]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[35]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[36]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[37]  Barry Smith,et al.  The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses , 2012, Plant & cell physiology.

[38]  Simon Jupp,et al.  A new Ontology Lookup Service at EMBL-EBI , 2015, SWAT4LS.

[39]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.