TOntoGen: A Synthetic Data Set Generator for Semantic Web Applications

The development of algorithms and applications for the Semantic Web requires high -quality ontologies for testing and validation. Our work on the SemDis [6] project centers on discovering complex relationships between entities. This require s the development of path discovery algorithms for RDF graphs, and we have found that it is often difficult to find sufficient data sets for testing these algorithms. Often publicly available, populated ontologies such as TAP [7] and SWETO [1] are lacking in either richness of relationships or sufficient instantiation of defined relationships. SWETO is our first attempt at creating a test bed for evaluating scalability and performance of Semantic Web techniques, technologies and tools. SWETO aims to create a large, populated ontology spanning many domains which uses real -world data extracted from various high-quality web sources. Using real -world data has several benefits for testing applications. However, the characteristics of the resulting knowledgebase o f the populated ontology depend on availability of data rather than the actual real -world characteristics of the domain. At the same time, depending on real -world data is also limiting in terms of the types of test suites that can be created for evaluating various scalability and performance aspects of Semantic Web related algorithms, techniques and tools.