Schema Evolution in Wikipedia - Toward a Web Information System Benchmark

Evolving the database that is at the core of an Information System represents a difficult maintenance problem that has only been studied in the framework of traditional information systems. However, the problem is likely to be even more severe in web information systems, where open-source software is often developed through the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an indepth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known example of a large family of web information systems built using the open-source software MediaWiki. Our study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, we briefly discuss documentation and automation support systems for database evolution, and suggest that the Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.

[1]  Gottfried Vossen,et al.  Schema Versioning in Data Warehouses , 2004, ER.

[2]  Cong Yu,et al.  Semantic Adaptation of Schema Mappings when Schemas Evolve , 2005, VLDB.

[3]  Lipyeow Lim,et al.  Preserving XML queries during schema evolution , 2007, WWW '07.

[4]  Junghoo Cho,et al.  On the Evolution of Wikipedia , 2007, ICWSM.

[5]  Álvaro F. Moreira,et al.  Temporal and versioning model for schema evolution in object-oriented databases , 2005, Data Knowl. Eng..

[6]  Sunny Marche,et al.  Measuring the stability of data models , 1993 .

[7]  Carlo Curino,et al.  Graceful database schema evolution: the PRISM workbench , 2008, Proc. VLDB Endow..

[8]  Ben Shneiderman,et al.  An architecture for automatic relational database sytem conversion , 1982, TODS.

[9]  Philip A. Bernstein,et al.  Implementing mapping composition , 2007, The VLDB Journal.

[10]  Erhard Rahm,et al.  Data Warehouse Scenarios for Model Management , 2000, ER.

[11]  John F. Roddick,et al.  A survey of schema versioning issues for database systems , 1995, Inf. Softw. Technol..

[12]  S. Ram,et al.  Research Issues in Database Schema Evolution: the Road Not Taken , 2003 .

[13]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[14]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[15]  Matteo Golfarelli,et al.  X-Time: Schema Versioning and Cross-Version Querying in Data Warehouses , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Renée J. Miller,et al.  Mapping Adaptation under Evolving Schemas , 2003, VLDB.

[17]  Enrico Franconi,et al.  Schema Evolution and Versioning: A Logical and Computational Characterisation , 2000, FMLDO.

[18]  D. Sjøberg,et al.  Quantifying schema evolution , 1993, Inf. Softw. Technol..

[19]  Guillaume Pierre,et al.  Wikipedia Workload Analysis , 2007 .

[20]  Carlo Curino,et al.  Managing and querying transaction-time databases under schema evolution , 2008, Proc. VLDB Endow..