ToXgene: An extensible template-based data generator for XML

Synthetic collections of XML documents are useful in many applications, such as benchmarking (e.g., XMach-1, Xmark), and algorithm testing and evaluation. We present ToXgene a template-based generator for large, consistent collections of synthetic XML documents. Templates are annotated XML Schema specifications describing both the structure and the content of the data to be generated. Our tool was designed to be declarative, and general enough to generate complex XML content and to capture most common requirements, such as those embodied in current benchmarks. The paper gives an overview of the ToXgene template specification language and the extensibility of our tool; and reports preliminary experiments with ToXgene carried out at the IBM Toronto Lab, which show that our tool can closely reproduce the data sets for the Xmark and the TPC-H benchmarks.

[1]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[2]  Jeffrey F. Naughton,et al.  Generating Synthetic Complex-Structured XML Data , 2001, WebDB.

[3]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[5]  Denilson Barbosa,et al.  ToX - the Toronto XML Engine , 2001, Workshop on Information Integration on the Web.

[6]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..