Revisiting Structured Storage: A Transactional Record Store

An increasing number of applications, such as electronic mail servers, web servers, and personal information managers, handle large amounts of homogeneous data. This data can be effectively represented as records and manipulated through simple operations, e.g., record reading, writing, and searching. Unfortunately, modern storage systems are inappropriate for the needs of these applications. On one side, file systems store only unstructured data (byte strings) with very limited reliability guarantees. On the other side, relational databases store structured data and provide both concurrency control and transactions; but relational databases are often too slow, complex, and difficult to manage for many applications. This paper presents a transactional record store that directly addresses the needs of modern applications. The store combines the simplicity and manageability of the file system interface with a select few features for managing record-oriented data. We describe the principles guiding the design of our transactional record store as well as its design. We also present a prototype implementation and its performance evaluation.

[1]  Robert Grimm,et al.  Atomic recovery units: failure atomicity for logical disks , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[2]  Gaetano Borriello,et al.  Next century challenges: data-centric networking for invisible computing: the Portolano project at the University of Washington , 1999, MobiCom.

[3]  Philip A. Bernstein,et al.  Context-Based Prefetch for Implementing Objects on Relations , 1999, VLDB.

[4]  Daniel F. Sterne,et al.  Practical Domain and Type Enforcement for UNIX , 1995, Proceedings 1995 IEEE Symposium on Security and Privacy.

[5]  Tobin J. Lehman,et al.  T Spaces , 1998, IBM Syst. J..

[6]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[7]  Dave Hitz,et al.  Merging NT and UNIX filesystem permissions , 1998 .

[8]  Divyakant Agrawal,et al.  Epidemic algorithms in replicated databases (extended abstract) , 1997, PODS.

[9]  Emin Gün Sirer,et al.  Design and implementation of a distributed virtual machine for networked computers , 1999, SOSP.

[10]  Rich Salz,et al.  UUIDs and GUIDs , 1998 .

[11]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[12]  Jim Waldo,et al.  The Jini Specification , 1999 .

[13]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[14]  Sixto Ortiz Embedded Databases Come out of Hiding , 2000, Computer.

[15]  Miguel Castro,et al.  Providing Persistent Objects in Distributed Systems , 1999, ECOOP.

[16]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[17]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[18]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[19]  Mahadev Satyanarayanan,et al.  Lightweight recoverable virtual memory , 1993, SOSP '93.

[20]  Robert Lesser,et al.  VSAM techniques : system concepts and programming procedures , 1987 .

[21]  M LevyHenry,et al.  Manageability, availability and performance in Porcupine , 1999 .

[22]  M. Weiser The Computer for the Twenty-First Century , 1991 .

[23]  Arthur M. Keller,et al.  Persistence software: bridging object-oriented programming and relational databases , 1993, SIGMOD '93.

[24]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[25]  Brian N. Bershad,et al.  A trace-driven comparison of algorithms for parallel prefetching and caching , 1996, OSDI '96.

[26]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[27]  David C. Steere,et al.  Exploiting the non-determinism and asynchrony of set iterators to reduce aggregate file I/O latency , 1997, SOSP.

[28]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[29]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[30]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[31]  Ken Arnold,et al.  JavaSpaces¿ Principles, Patterns, and Practice , 1999 .

[32]  Gregory R. Ganger,et al.  Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem , 1999, USENIX Annual Technical Conference, FREENIX Track.

[33]  Butler W. Lampson,et al.  SPKI Certificate Theory , 1999, RFC.

[34]  Sailesh Chutani,et al.  The Episode File System , 1992 .

[35]  Mark R. Crispin,et al.  Internet Message Access Protocol - Version 4 , 1994, RFC.

[36]  Alexander A. Stepanov,et al.  Mime: a high performance parallel storage device with strong recovery guarantees , 1997 .

[37]  Philip A. Bernstein,et al.  The Microsoft Repository , 1997, VLDB.

[38]  Peter J. Keleher,et al.  Decentralized replicated-object protocols , 1999, PODC '99.

[39]  Mark R. Crispin Internet Message Access Protocol - Version 4rev1 , 1996, RFC.

[40]  Jennifer Widom Data Management for XML: Research Directions , 1999, IEEE Data Eng. Bull..