Parameter driven synthetic web database generation

To support intelligent data analysis on the web information, a data warehousing system called WHOWEDA (WareHouse Of WEb DAta) has been proposed. Unlike other relational data warehouses, WHOWEDA incorporates a Web Data Model that describes the web objects and their relationships as they are maintained within a data warehouse. A set of web operations has also been developed to manipulate the warehoused web information. In order to measure the performance of WHOWEDA and other similar systems that store and manipulate web information, a synthetic web database generator called WEDAGEN (WEb DAtabase GENerator) has been developed. It has the capability of generating collections of web objects of different sizes and complexities determined by a set of user-specified parameters. This paper presents the issues in the design and implementation of WEDAGEN. It also gives a detailed description of its system components and the strategy to generate synthetic web databases. A formal analysis of the generated web database and an empirical assessment of WEDAGEN has been reported.

[1]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[2]  Sourav S. Bhowmick,et al.  Web warehousing: an algebra for web information , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[3]  Ravi Kumar,et al.  Extracting Large-Scale Knowledge Bases from the Web , 1999, VLDB.

[4]  Udi Manber,et al.  Connecting Diverse Web Search Facilities , 1998, IEEE Data Eng. Bull..

[5]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[6]  Michael Stonebraker,et al.  The Sequoia 2000 Benchmark , 1993, SIGMOD Conference.

[7]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[8]  Ee-Peng Lim,et al.  On Warehousing Historical Web Information , 2000, ER.

[9]  R. G. G. Cattell,et al.  Object operations benchmark , 1992, TODS.

[10]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[11]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[12]  David J. DeWitt,et al.  The 007 Benchmark , 1993, SIGMOD '93.

[13]  Dan Suciu,et al.  STRUDEL: a Web site management system , 1997, SIGMOD '97.

[14]  Leen Ammeraal STL for C++ programmers , 1997 .

[15]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[16]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[17]  Sourav S. Bhowmick,et al.  A Data Warehousing System for Web Information , 1998 .

[18]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[19]  Ee-Peng Lim,et al.  Storage Management of a Historical Web Warehousing System , 2000, DEXA.

[20]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[21]  T. Lougenia Anderson,et al.  The HyperModel Benchmark , 1990, EDBT.