Integrating reliable memory in databases

Abstract. Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system crashes. This paper explores how to integrate the reliable memory provided by the Rio file cache into a database system. Prior studies have analyzed the performance benefits of reliable memory; we focus instead on how different designs affect reliability. We propose three designs for integrating reliable memory into databases: non-persistent database buffer cache, persistent database buffer cache, and persistent database buffer cache with protection. Non-persistent buffer caches use an I/O interface to reliable memory and require the fewest modifications to existing databases. However, they waste memory capacity and bandwidth due to double buffering. Persistent buffer caches use a memory interface to reliable memory by mapping it into the database address space. This places reliable memory under complete database control and eliminates double buffering, but it may expose the buffer cache to database errors. Our third design reduces this exposure by write protecting the buffer pages. Extensive fault tests show that mapping reliable memory into the database address space does not significantly hurt reliability. This is because wild stores rarely touch dirty, committed pages written by previous transactions. As a result, we believe that databases should use a memory interface to reliable memory.

[1]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[2]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[3]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[4]  Robert B. Hagmann A Crash Recovery Scheme for a Memory-Resident Database System , 1986, IEEE Transactions on Computers.

[5]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[6]  Kenneth Salem,et al.  Management of Partially Safe Buffers , 1995, IEEE Trans. Computers.

[7]  Wayne M. Cardoza,et al.  Design of the TruCluster Multicomputer System for the Digital UNIX Environment , 1996, Digit. Tech. J..

[8]  Michel Banâtre,et al.  Lessons from FTM: An Experiment in Design and Implementation of a Low-Cost Fault-Tolerant System , 1996, IEEE Trans. Reliab..

[9]  David R. Cheriton,et al.  Application-controlled physical memory using external page-cache management , 1992, ASPLOS V.

[10]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[11]  Mary K. Vernon,et al.  Performance of the SCI ring , 1992, ISCA '92.

[12]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[13]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[14]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[15]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[16]  Garcia-MolinaH.,et al.  Main Memory Database Systems , 1992 .

[17]  Ravi Krishnamurthy,et al.  The Case For Safe RAM , 1989, VLDB.

[18]  Margo I. Seltzer,et al.  Dealing with disaster: surviving misbehaved kernel extensions , 1996, OSDI '96.

[19]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[20]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[21]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[22]  Mark Sullivan,et al.  System Support for Software Fault Tolerance in Highly Available Database Management Systems , 1992 .

[23]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[24]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[25]  Richard B. Gillett Memory Channel Network for PCI , 1996, IEEE Micro.

[26]  Peter M. Chen,et al.  The Rio file cache: surviving operating system crashes , 1996, ASPLOS VII.

[27]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[28]  S. Sudarshan,et al.  The Architecture of the Dalí Main-Memory Storage Manager , 1997, Bell Labs Technical Journal.

[29]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[30]  Brian Randell,et al.  Operating Systems, An Advanced Course , 1978 .

[31]  Michael Wu,et al.  eNVy: a non-volatile, main memory storage system , 1994, ASPLOS VI.

[32]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[33]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[34]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[35]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[36]  Beichelt Frank,et al.  IEEE Trans. Reliab. , 1992 .

[37]  Michael Stonebraker,et al.  Using Write Protected Data Structures To Improve Software Fault Tolerance in Highly Available Database Management Systems , 1991, VLDB.

[38]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[39]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[40]  Daniel P. Siewiorek,et al.  Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.

[41]  Ravishankar K. Iyer,et al.  Experimental evaluation , 1995 .

[42]  H. V. Jagadish,et al.  Recovery Algorithms for Database Machines with Nonvolatile Main Memory , 1989, IWDM.

[43]  Mark Sullivan,et al.  A comparison of software defects in database management systems and operating systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[44]  Rudolf Bayer,et al.  A database cache for high performance and fast restart in database systems , 1984, TODS.

[45]  Evangelos P. Markatos,et al.  Lightweight transactions on networks of workstations , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[46]  Mahadev Satyanarayanan,et al.  Lightweight Recoverable Virtual Memory , 1993, SOSP.

[47]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[48]  Erhard Rahm,et al.  Performance evaluation of extended storage architectures for transaction processing , 1992, SIGMOD '92.

[49]  Daniel M. Dias,et al.  A case for fault-tolerant memory for transaction processing , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.