Detection and Recovery Techniques for Database Corruption

Increasingly, for extensibility and performance, special purpose application code is being integrated with database system code. Such application code has direct access to database system buffers, and as a result, the danger of data being corrupted due to inadvertent application writes is increased. Previously proposed hardware techniques to protect from corruption require system calls, and their performance depends on details of the hardware architecture. We investigate an alternative approach which uses codewords associated with regions of data to detect corruption and to prevent corrupted data from being used by subsequent transactions. We develop several such techniques which vary in the level of protection, space overhead, performance, and impact on concurrency. These techniques are implemented in the Dali main-memory storage manager, and the performance impact of each on normal processing is evaluated. Novel techniques are developed to recover when a transaction has read corrupted data caused by a bad write and gone on to write other data in the database. These techniques use limited and relatively low-cost logging of transaction reads to trace the corruption and may also prove useful when resolving problems caused by incorrect data entry and other logical errors.

[1]  Charles T. Davies,et al.  Data Processing Spheres of Control , 1978, IBM Syst. J..

[2]  Klaus Küspert Principles of error detection in storage structures of database systems , 1986 .

[3]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[4]  Peter M. Chen,et al.  Integrating reliable memory in databases , 1998, The VLDB Journal.

[5]  Rajeev Rastogi,et al.  The architecture of the Dalí main memory storage manager , 1997, Bell Labs Tech. J..

[6]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[7]  Tobin J. Lehman,et al.  Locking and Latching in a Memory-Resident Database System , 1992, VLDB.

[8]  Michael Stonebraker,et al.  Using Write Protected Data Structures To Improve Software Fault Tolerance in Highly Available Database Management Systems , 1991, VLDB.

[9]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[10]  David J. Taylor,et al.  Redundancy in Data Structures: Some Theoretical Results , 1980, IEEE Transactions on Software Engineering.

[11]  James P. Black,et al.  Redundancy in Data Structures: Improving Software Fault Tolerance , 1980, IEEE Transactions on Software Engineering.

[12]  S. Sudarshan,et al.  Recovering from Main-Memory Lapses , 1993, VLDB.

[13]  Gerhard Weikum,et al.  Multi-level recovery , 1990, PODS.

[14]  Kevin Loney,et al.  Oracle8 DBA Handbook , 1994 .

[15]  S. Sudarshan,et al.  Distributed Multi-Level Recovery in Main-Memory Databases , 2004, Distributed and Parallel Databases.

[16]  David B. Lomet,et al.  MLR: a recovery method for multi-level systems , 1992, SIGMOD '92.

[17]  David J. Taylor,et al.  Special Feature A Survey of Methods of Achieving Reliable Software , 1977, Computer.

[18]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[19]  S. Sudarshan,et al.  Distributed Multi-Level Recovery in Main-Memory Databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[20]  Lawrence A. Bjork Generalized Audit Trail Requirements and Concepts for Data Base Applications , 1975, IBM Syst. J..

[21]  S. Sudarshan,et al.  The Architecture of the Dalí Main-Memory Storage Manager , 1997, Bell Labs Technical Journal.

[22]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[23]  S. Sudarshan,et al.  Dalí: A High Performance Main Memory Storage Manager , 1994, VLDB.

[24]  Mark Sullivan,et al.  System Support for Software Fault Tolerance in Highly Available Database Management Systems , 1992 .

[25]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[26]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[27]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[28]  Michael Stonebraker,et al.  The POSTGRES Papers , 1987 .

[29]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .