Track-based disk logging

Disk logging is a fundamental building block for fault tolerance system design because it captures a persistent snapshot of critical system state for subsequent recovery in the occurrence of failures. Logging typically is required to be synchronous to ensure absolute recoverability. Therefore speeding up synchronous disk write is critical to those fault tolerance systems that are based on disk logging. This paper describes a novel track-based disk logging technique that is able to reduce the latency of synchronous disk writes to the minimum without compromising data integrity guarantee. As an application of track-based disk logging, we present the design and implementation of a low-write-latency disk subsystem called Trail. Through a fully operational Trail prototype, we demonstrate that Trail achieves the best known disk. logging performance record, which is close to data transfer delay plus command processing overhead. A 4 KByte disk write takes less than 1.5 msec. Based on the TPC-C benchmark, the transaction throughput of a Trail-based transaction processing system is on an average 62.9% higher-than one based on a standard disk subsystem, and the database logging-related disk I/O overhead is reduced by 42%.

[1]  Peter P. Uhrowczik,et al.  IMS/VS: An Evolving System , 1982, IBM Syst. J..

[2]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[3]  David A. Patterson,et al.  Virtual log based file systems for a programmable disk , 1999, OSDI '99.

[4]  Karl L. Swartz The Brave Little Toaster Meets Usenet , 1996, LISA.

[5]  John Wilkes,et al.  Disk scheduling algorithms based on rotational position , 1991 .

[6]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[7]  Qing Yang,et al.  DCD --- Disk Caching Disk: A New Approach for Boosting I/O Performance , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  Rudolf Bayer,et al.  A database cache for high performance and fast restart in database systems , 1984, TODS.

[9]  Margo I. Seltzer,et al.  Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems , 2000, USENIX Annual Technical Conference, General Track.

[10]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[11]  Tzi-cker Chiueh Trail: a track-based logging disk architecture for zero-overhead writes , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[12]  Sailesh Chutani,et al.  The Episode File System , 1992 .

[13]  Yiming Hu,et al.  DCD—disk caching disk: a new approach for boosting I/O performance , 1996, ISCA '96.

[14]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[15]  Margo I. Seltzer,et al.  LIBTP: Portable, Modular Transactions for UNIX , 1992 .

[16]  Yale N. Patt,et al.  On-line extraction of SCSI disk drive parameters , 1995, SIGMETRICS '95/PERFORMANCE '95.

[17]  Xiang Yu,et al.  Trading capacity for performance in a disk array , 2000, OSDI.