Toward On-line Schema Evolution for Non-stop Systems

Schemas are a central component of any database system, and have been an area of intense research from the very beginning of the field. In the past, this work has mainly focused on using conceptual modeling techniques to generate schemas from application requirements, schema normalization, schema mapping and schema matching in data integration systems. Dynamically changing schemas, on the other hand, have received much less attention, especially in the context of relational database systems. We conjecture three reasons for this: (1) a perception that schemas do not need to evolve; i.e., that they are essentially static; (2) a belief that if they do need to evolve, it can be done off-line; and (3) that even if on-line evolution was desirable, it is too hard to implement. With the increasing number of web services, long running transactions, and other applications that cannot afford to stay offline for even minutes, and with increasing DBA-less installations of databases (e.g. “Digital Home”), we believe the first two of these reasons are no longer valid. This motivates our interest in the third issue. Supporting on-line evolution is non-trivial. Almost every aspect of a running database system is tied to the schema of the database. Most important are obviously the application programs (many of which are outside the domain of the database system) that interact with the schemas directly, using queries over the schemas. Many of these programs may be affected by a change to the schema. Physical data structures like indexes are tied to the schema as well, and may have to updated if new tables need to be created. If the schema change is to be affected while the database system is running, many of the internal components (e.g., query processor component) may be affected by a schema change (especially in presence of live ”cursors”). If the changes require creating or removing tables, then the concurrency/locking components are also affected. In this paper, we propose an approach to on-line schema evolution that coordinates the updating of the schema with the updating of applications, employing two mechanisms. First, applications compatible with the new schema are