A Snappy B+-Trees Index Reconstruction for Main-Memory Storage Systems

A main memory system employs a main memory rather than a disk as a primary storage and efficiently supports various real time applications that require high performance. The time to recover the system from failure needs to be shortened for real time service, and fast index reconstruction is an essential step for data recovery. In this paper, we present a snappy B+-Tree reconstruction algorithm called Max-PL. The basic Max-PL (called Max) stores the max keys of the leaf nodes at backup time and reconstructs the B+-Tree index structure using the pre-stored max keys at restoration time. Max-PL employs a parallelism to Max in order to improve the performance. We analyze the time complexity of the algorithm, and perform the experimental evaluation to compare its performance with others. Using Max-PL, we achieve a speedup of 2 over Batch Construction and 6.7 over B+-tree Insertion at least.

[1]  K. M. Chandy,et al.  Incremental Recovery In Main Memory Database Systems , 1992 .

[2]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[3]  Sailesh Chutani,et al.  The Episode File System , 1992 .

[4]  Hongjun Lu,et al.  T-tree or B-tree: main memory database index structure revisited , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[5]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[6]  Frank B. Schmuck,et al.  Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems , 1995 .

[7]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[8]  Stephen Tweedie,et al.  Planned Extensions to the Linux Ext2/Ext3 Filesystem , 2002, USENIX Annual Technical Conference, FREENIX Track.

[9]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[10]  David B. Lomet,et al.  Concurrency and recovery for index trees , 1997, The VLDB Journal.

[11]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[12]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[13]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[14]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[15]  James E. Johnson,et al.  Overview of the Spiralog File System , 1996, Digit. Tech. J..

[16]  Ian Marsh,et al.  A multicast-based distributed file system for the internet , 1996, EW 7.

[17]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[18]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[19]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[20]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[21]  Wilson C. Hsieh,et al.  The logical disk: a new approach to improving file systems , 1994, SOSP '93.

[22]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[23]  Matthew T. O'Keefe,et al.  Scalability and Failure Recovery in a Linux Cluster File System , 2000, Annual Linux Showcase & Conference.

[24]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[25]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[26]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[27]  Ki Hong Kim,et al.  Xmas: an extensible main-memory storage system for high-performance applications , 1998, SIGMOD '98.

[28]  Design of the Server for the Spiralog File System , 1996, Digit. Tech. J..

[29]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[30]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[31]  Jim Zelenka,et al.  The Scotch parallel storage systems , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[32]  S. Sudarshan,et al.  Dalí: A High Performance Main Memory Storage Manager , 1994, VLDB.

[33]  Hee-Sun Won,et al.  Batch-construction of B+-trees , 2001, SAC.

[34]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[35]  Grant Erickson,et al.  A 64-bit, shared disk file system for Linux , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[36]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[37]  J. Wilkes DataMesh-parallel storage systems for the 1990s , 1991, [1991] Digest of Papers Eleventh IEEE Symposium on Mass Storage Systems.

[38]  Angelos Bilas,et al.  Client-server computing on Shrimp , 1997, IEEE Micro.

[39]  John H. Hartman,et al.  Efficient cooperative caching using hints , 1996, OSDI '96.

[40]  Liviu Iftode,et al.  Design choices in the SHRIMP system: an empirical study , 1998, ISCA.

[41]  Yong-Ik Yoon,et al.  An index recovery method for real-time DBMS in client-server architecture , 1997, Proceedings Fourth International Workshop on Real-Time Computing Systems and Applications.

[42]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[43]  Assar Westerlund,et al.  The design of a multicast-based distributed file system , 1999, OSDI '99.

[44]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[45]  Kai Li,et al.  Early Experience with Message-Passing on the SHRIMP Multicomputer , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[46]  James V. Huber Ppfs: An Experimental File System For High Performance Parallel Input/output , 1995 .

[47]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[48]  Kihong Kim,et al.  Differential logging: a commutative and associative logging scheme for highly parallel main memory database , 2001, Proceedings 17th International Conference on Data Engineering.

[49]  Eran Gabber,et al.  Let's put NetApp and CacheFlow out of business! , 2000, ACM SIGOPS European Workshop.

[50]  Shivakumar Venkataraman,et al.  The TickerTAIP parallel RAID architecture , 1993, ISCA '93.

[51]  Garret Swart,et al.  The Echo Distributed File System , 1996 .

[52]  M. L. Scott,et al.  Beyond striping: the bridge multiprocessor file system , 1989, CARN.

[53]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[54]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[55]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[56]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[57]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[58]  Mon-Yen Luo,et al.  Efficient Support for Content-based Routing in Web Server Clusters , 1999, USENIX Symposium on Internet Technologies and Systems.

[59]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[60]  Murthy V. Devarakonda,et al.  Recovery in the Calypso file system , 1996, TOCS.

[61]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[62]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[63]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[64]  David Kotz,et al.  The galley parallel file system , 1997, ICS '96.

[65]  Garret Swart,et al.  A coherent distributed file cache with directory write-behind , 1994, TOCS.

[66]  Eric A. Brewer,et al.  Harvest, yield, and scalable tolerant systems , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[67]  Cezary Dubnicki,et al.  VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .

[68]  Garth A. Gibson,et al.  Scalable and manageable storage systems , 2000 .

[69]  Yale N. Patt,et al.  Soft updates: a solution to the metadata update problem in file systems , 2000 .

[70]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[71]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[72]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[73]  P. Couvares Caching in the Sprite network file system , 2006 .

[74]  Hector Garcia-Molina,et al.  Checkpointing memory-resident databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[75]  Eran Gabber,et al.  Storage Management for Web Proxies , 2001, USENIX Annual Technical Conference, General Track.

[76]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[77]  S. Sudarshan,et al.  Recovering from Main-Memory Lapses , 1993, VLDB.

[78]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[79]  Edward W. Felten,et al.  Simplifying Distributed File Systems Using a Shared Logical Disk , 1996 .

[80]  Gregory R. Ganger,et al.  Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem , 1999, USENIX Annual Technical Conference, FREENIX Track.

[81]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[82]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[83]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[84]  T. J. Kowalski,et al.  Fsck—the UNIX file system check program , 1990 .

[85]  Garret Swart,et al.  New-value Logging in the Echo Replicated File System , 1996 .

[86]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[87]  David K. Gifford,et al.  A caching file system for a programmer's workstation , 1985, SOSP '85.

[88]  John K. Ousterhout,et al.  Sawmill: A High-Bandwidth Logging File System , 1994, USENIX Summer.

[89]  Andrew S. Tanenbaum,et al.  Distributed operating systems , 2009, CSUR.

[90]  Noah Treuhaft,et al.  Cluster I/O with River: making the fast case common , 1999, IOPADS '99.

[91]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[92]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[93]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[94]  Dror G. Feitelson,et al.  Overview of the Vesta parallel file system , 1993, CARN.

[95]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[96]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[97]  S. Sudarshan,et al.  DataBlitz: A High Performance Main-Memory Storage Manager , 1994, VLDB.

[98]  David R. Hanson C Interfaces and Implementations , 1997 .

[99]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[100]  Carl Staelin,et al.  The HP AutoRAID hierarchical storage system , 1995, SOSP.

[101]  C. R. Atanasio Design and implementation of a re-coverable virtual shared disk , 1994 .

[102]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[103]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[104]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[105]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.