A parallel general-purpose synthetic data generator

PSDG is a parallel synthetic data generator designed to generate "industrial sized" data sets quickly using cluster computing. PSDG depends on SDDL, a synthetic data description language that provides flexibility in the types of data we can generate.

[1]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[2]  Meikel Pöss,et al.  MUDD: a multi-dimensional data generator , 2004, WOSP '04.

[3]  Patrick E. O'Neil The Set Query Benchmark , 1991, The Benchmark Handbook.

[4]  Rui Xiao,et al.  Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[5]  Rico Wind,et al.  Simple and realistic data generation , 2006, VLDB.

[6]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.