LogBase: A Scalable Log-structured Database System in the Cloud

Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase -- a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. It is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.

[1]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[2]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[3]  Michael Stonebraker,et al.  The design of POSTGRES , 1986, SIGMOD '86.

[4]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[5]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[6]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[7]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[8]  David B. Lomet,et al.  Exploiting A History Database for Backup , 1993, VLDB.

[9]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[10]  Alexander Thomasian Distributed Optimistic Concurrency Control Methods for High-Performance Transaction Processing , 1998, IEEE Trans. Knowl. Data Eng..

[11]  Gerhard Weikum,et al.  The LHAM log-structured history data access method , 2000, The VLDB Journal.

[12]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[13]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[14]  Kjetil Nørvåg The vagabond approach to logging and recovery in transaction-time temporal object database systems , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Divyakant Agrawal,et al.  Distributed optimistic concurrency control with reduced rollback , 2005, Distributed Computing.

[17]  Dennis Shasha,et al.  Making snapshot isolation serializable , 2005, TODS.

[18]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[20]  Alan Fekete,et al.  Serializable isolation for snapshot databases , 2008, SIGMOD Conference.

[21]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[22]  Michael J. Cahill Serializable isolation for snapshot databases , 2009, TODS.

[23]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[24]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[25]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[26]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[27]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[28]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[29]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[30]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[31]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[32]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[33]  Alekh Jindal,et al.  Towards a One Size Fits All Database Architecture , 2011, CIDR.

[34]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[35]  Michael Vrable,et al.  BlueSky: a cloud-backed file system for the enterprise , 2012, FAST.

[36]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.