Using codewords to protect database data from a class of software errors

Increasingly, for extensibility and performance, special-purpose application code is being integrated with database system code. Such application code has direct access to database system buffers and, as a result, the danger of data being corrupted due to inadvertent application writes is increased. Previously proposed hardware techniques to protect data from corruption required system calls, and their performance depended on the details of the hardware architecture. We investigate an alternative approach which uses codewords associated with regions of data to detect corruption and to prevent corrupted data from being used by subsequent transactions. We develop several such techniques which vary in the level of protection, space overhead, performance and impact on concurrency. These techniques are implemented in the Dali/spl acute/ main-memory storage manager, and the performance impact of each on normal processing is evaluated. Novel techniques are developed to recover when a transaction has read corrupted data caused by a bad write, and then gone on to write other data in the database. These techniques use limited and relatively low-cost logging of transaction reads to trace the corruption, and may also prove useful when resolving problems caused by incorrect data entry and other logical errors.

[1]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[2]  David B. Lomet,et al.  MLR: a recovery method for multi-level systems , 1992, SIGMOD '92.

[3]  Klaus Küspert Principles of error detection in storage structures of database systems , 1986 .

[4]  David J. Taylor,et al.  Special Feature A Survey of Methods of Achieving Reliable Software , 1977, Computer.

[5]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[6]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[7]  S. Sudarshan,et al.  The Architecture of the Dalí Main-Memory Storage Manager , 1997, Bell Labs Technical Journal.

[8]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[9]  Charles T. Davies,et al.  Data Processing Spheres of Control , 1978, IBM Syst. J..

[10]  S. Sudarshan,et al.  Distributed Multi-Level Recovery in Main-Memory Databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  Kevin Loney,et al.  Oracle8 DBA Handbook , 1994 .

[12]  James P. Black,et al.  Redundancy in Data Structures: Improving Software Fault Tolerance , 1980, IEEE Transactions on Software Engineering.

[13]  S. Sudarshan,et al.  DataBlitz: A High Performance Main-Memory Storage Manager , 1994, VLDB.

[14]  Michael Stonebraker,et al.  The POSTGRES Papers , 1987 .

[15]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[16]  Michael Stonebraker,et al.  Using Write Protected Data Structures To Improve Software Fault Tolerance in Highly Available Database Management Systems , 1991, VLDB.

[17]  Peter M. Chen,et al.  Integrating reliable memory in databases , 1998, The VLDB Journal.

[18]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[19]  Lawrence A. Bjork Generalized Audit Trail Requirements and Concepts for Data Base Applications , 1975, IBM Syst. J..

[20]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[21]  S. Sudarshan,et al.  Dalí: A High Performance Main Memory Storage Manager , 1994, VLDB.

[22]  Mark Sullivan,et al.  System Support for Software Fault Tolerance in Highly Available Database Management Systems , 1992 .

[23]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.