Sequential linked data: The state of affairs

Sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have discussed the data models that Knowledge Graphs commonly use for representing sequences and showed how these models have an impact on query performance and that this impact is invariant to triplestore implementations. However, the specific list operations that the management of Sequential Linked Data requires beyond the simple retrieval of an entire list or a range of its elements – e.g. to add or remove elements from a list –, and their impact in the various list data models, remain unclear. Covering this knowledge gap would be a significant step towards the realization of a Semantic Web list Application Programming Interface (API) that standardizes list manipulation and generalizes beyond specific data models. In order to address these challenges towards the realization of such an API, we build on our previous work in understanding the effects of various sequential data models for Knowledge Graphs, extending our benchmark and proposing a set of read-write Semantic Web list operations in SPARQL, with insert, update and delete support. To do so, we identify five classic list-based computer science sequential data structures (linked list, double linked list, stack, queue, and array), from which we derive nine atomic read-write operations for Semantic Web lists. We propose a SPARQL implementation of these operations with five typical RDF data models and compare their performance by executing them against six increasing dataset sizes and four different triplestores. In light of our results, we discuss the feasibility of our devised API and reflect on the state of affairs of Sequential Linked Data.

[1]  María Poveda-Villalón,et al.  Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web , 2016, Semantic Web.

[2]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[3]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[4]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[5]  Taha Osman,et al.  A Pragmatic Approach to Semantic Repositories Benchmarking , 2010, ESWC.

[6]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[8]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[9]  Aldo Gangemi,et al.  The MIDI Linked Data Cloud , 2017, International Semantic Web Conference.

[10]  Simon Cox,et al.  Time Ontology in OWL , 2017 .

[11]  T. Penzel,et al.  d2 , 2020, Springer Reference Medizin.

[12]  Muhammad Saleem,et al.  LSQ: The Linked SPARQL Queries Dataset , 2015, SEMWEB.

[13]  Aldo Gangemi,et al.  Ontology Design Patterns for Semantic Web Content , 2005, SEMWEB.

[14]  Charles Lins,et al.  The Bounded Queue , 1989 .

[15]  Jens Lehmann,et al.  Iguana: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores , 2017, SEMWEB.

[16]  Albert Meroño-Peñuela,et al.  List.MID: A MIDI-Based Benchmark for Evaluating RDF Lists , 2019, SEMWEB.

[17]  Rinke Hoekstra,et al.  The Song Remains the Same: Lossless Conversion and Streaming of MIDI to RDF and Back , 2016, ESWC.

[18]  Simone Muench,et al.  Queue , 2020 .

[19]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[20]  Enrico Motta,et al.  Modelling and Querying Lists in RDF. A Pragmatic Study , 2019, QuWeDa@ISWC.

[21]  Jakub Radoszewski,et al.  Experimental Evaluation of Algorithms for Computing Quasiperiods , 2019, Theor. Comput. Sci..

[22]  Josep-Lluís Larriba-Pey,et al.  The linked data benchmark council: a graph and RDF industry benchmarking effort , 2014, SGMD.

[23]  Silvio Peroni,et al.  The Collections Ontology: Creating and handling collections in OWL 2 DL frameworks , 2014, Semantic Web.

[24]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[25]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[26]  Aldo Gangemi,et al.  D2.5.2 Pattern based ontology design: methodology and software support , 2010 .

[27]  Stefan Schlobach,et al.  LOD Laundromat: A Uniform Way of Publishing Other People's Dirty Data , 2014, SEMWEB.

[28]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Jürgen Umbrich,et al.  SPARQL Web-Querying Infrastructure: Ready for Action? , 2013, SEMWEB.

[30]  Enrico Daga,et al.  A BASILar Approach for Building Web APIs on Top of SPARQL Endpoints , 2015, SALAD@ESWC.

[31]  Steffen Staab,et al.  SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data , 2012, SEMWEB.

[32]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[33]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[34]  Rinke Hoekstra,et al.  grlc Makes GitHub Taste Like Linked Data APIs , 2016, SALAD@ESWC.