PoweRGen: A power-law based generator of RDFS schemas

As the amount of RDF datasets available on the Web has grown significantly over the last years, scalability and performance of Semantic Web (SW) systems are gaining importance. Current RDF benchmarking efforts either consider schema-less RDF datasets or rely on fixed RDFS schemas. In this paper, we present the first RDFS schema generator, termed PoweRGen, which takes into account the features exhibited by real SW schemas. It considers the power-law functions involved in (a) the combined in- and out-degree distribution of the property graph (which captures the domains and ranges of the properties defined in a schema) and (b) the out-degree distribution of the transitive closure (TC) of the subsumption graph (which essentially captures the class hierarchy). The synthetic schemas generated by PoweRGen respect the power-law functions given as input with an accuracy ranging between 89 and 96%, as well as, various morphological characteristics regarding the subsumption hierarchy depth, structure, etc.

[1]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[2]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[3]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[4]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[5]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[6]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[7]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[8]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[9]  Vassilis Christophides,et al.  Ieee Transactions on Knowledge and Data Engineering on Graph Features of Semantic Web Schemas , 2022 .

[10]  Jeff Heflin,et al.  A Requirements Driven Framework for Benchmarking Semantic Web Knowledge Base Systems , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[12]  Jeff Heflin,et al.  Benchmarking DAML+OIL Repositories , 2003, SEMWEB.

[13]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[14]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[15]  Vassilis Christophides,et al.  On the Synthetic Generation of Semantic Web Schemas , 2008, SWDB-ODBIS.

[16]  William H. Press,et al.  Numerical recipes in C , 2002 .

[17]  Marcelo Arenas,et al.  nSPARQL: A Navigational Language for RDF , 2008, SEMWEB.

[18]  G. Dantzig,et al.  Integral Extreme Points , 1968 .

[19]  D. Spielman,et al.  Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time , 2004 .

[20]  Vassilis Christophides,et al.  On the Foundations of Computing Deltas Between RDF Models , 2007, ISWC/ASWC.

[21]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[22]  Persi Diaconis,et al.  A Sequential Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees , 2011, Internet Math..

[23]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[24]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[25]  Vojtech Svátek,et al.  In Vitro Study of Mapping Method Interactions in a Name Pattern Landscape , 2007, OM.

[26]  Vassilis Christophides,et al.  Benchmarking RDF Schemas for the Semantic Web , 2002, SEMWEB.

[27]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[28]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[29]  A. Tucker,et al.  Linear Inequalities And Related Systems , 1956 .

[30]  Vassilis Christophides,et al.  Benchmarking Database Representations of RDF/S Stores , 2005, SEMWEB.

[31]  Vassilis Christophides,et al.  A Formal Approach for RDF/S Ontology Evolution , 2008, ECAI.