Rapid Benchmarking for Semantic Web Knowledge Base Systems

We present a method for rapid development of benchmarks for Semantic Web knowledge base systems. At the core, we have a synthetic data generation approach for OWL that is scalable and models the real world data. The data-generation algorithm learns from real domain documents and generates benchmark data based on the extracted properties relevant for benchmarking. We believe that this is important because relative performance of systems will vary depending on the structure of the ontology and data used. However, due to the novelty of the Semantic Web, we rarely have sufficient data for benchmarking. Our approach helps overcome the problem of having insufficient real world data for benchmarking and allows us to develop benchmarks for a variety of domains and applications in a very time efficient manner. Based on our method, we have created a new Lehigh BibTeX Benchmark and conducted an experiment on four Semantic Web knowledge base systems. We have verified our hypothesis about the need for representative data by comparing the experimental result to that of our previous Lehigh University Benchmark. The difference in both experiments has demonstrated the influence of ontology and data on the capability and performance of the systems and thus the need of using a representative benchmark for the intended application of the systems.

[1]  Vassilis Christophides,et al.  On Storing Voluminous RDF Descriptions: The Case of Web Portal Catalogs , 2001, WebDB.

[2]  William C. Regli,et al.  DAMLJessKB: A Tool for Reasoning with the Semantic Web , 2003, IEEE Intell. Syst..

[3]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[4]  Vassilis Christophides,et al.  Querying Community Web Portals , 2001 .

[5]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[6]  James A. Hendler,et al.  The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[7]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[8]  John Mylopoulos,et al.  The Semantic Web - ISWC 2003 , 2003, Lecture Notes in Computer Science.

[9]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[10]  Bernard Ycart,et al.  Generating Random Benchmarks for Description Logics , 1998, Description Logics.

[11]  Carolyn Turbyfill,et al.  A retrospective on the Wisconsin Benchmark , 1994 .

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  I. Horrocks,et al.  DL Systems Comparison , 1998 .

[14]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[15]  R. G. G. Cattell,et al.  The Engineering Database Benchmark , 1994, The Benchmark Handbook.

[16]  Christoph Tempich,et al.  Towards a benchmark for Semantic Web reasoners - an analysis of the DAML ontology library , 2003, EON.

[17]  István Manno,et al.  Introduction to the Monte-Carlo Method , 1999 .

[18]  Jeff Heflin,et al.  DLDB: Extending Relational Databases to Support Semantic Web Queries , 2003, PSSS.

[19]  Jeff Heflin,et al.  Benchmarking DAML+OIL Repositories , 2003, SEMWEB.

[20]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.