Scalable and Highly Available Database Systems in the Cloud

Cloud computing allows users to tap into a massive pool of shared computing resources such as servers, storage, and network. These resources are provided as a service to the users allowing them to “plug into the cloud” similar to a utility grid. The promise of the cloud is to free users from the tedious and often complex task of managing and provisioning computing resources to run applications. At the same time, the cloud brings several additional benefits including: a pay-as-you-go cost model, easier deployment of applications, elastic scalability, high availability, and a more robust and secure infrastructure. One important class of applications that users are increasingly deploying in the cloud is database management systems. Database management systems differ from other types of applications in that they manage large amounts of state that is frequently updated, and that must be kept consistent at all scales and in the presence of failure. This makes it difficult to provide scalability and high availability for database systems in the cloud. In this thesis, we show how we can exploit cloud technologies and relational database systems to provide a highly available and scalable database service in the cloud. The first part of the thesis presents RemusDB, a reliable, cost-effective high availability solution that is implemented as a service provided by the virtualization platform. RemusDB can make any database system highly available with little or no code modifications by exploiting the capabilities of virtualization. In the second part of the thesis, we present two systems that aim to provide elastic scalability for database systems in the cloud using two very different approaches. The three systems presented in this thesis bring us closer to the goal of building a scalable and reliable transactional database service in the cloud.

[1]  Carlo Curino,et al.  Relational Cloud: a Database Service for the cloud , 2011, CIDR.

[2]  S. S. Ravi,et al.  Deferred updates and data placement in distributed databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[3]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[4]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[5]  Peter M. Spiro How the Rdb � VMS Data Sharing System Became Fast , 1992 .

[6]  Jang-Ping Sheu,et al.  Design and implementation of a distributed file system , 1991, Softw. Pract. Exp..

[7]  Rahul Simha,et al.  Experimental evaluation of dynamic data allocation strategies in a distributed database with changing workloads , 1995, CIKM '95.

[8]  Ashraf Aboulnaga,et al.  Database systems on virtual machines: How much do you lose? , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[9]  Satish Narayanasamy,et al.  Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[10]  Peter M G Apers,et al.  Data allocation in distributed database systems , 1988, TODS.

[11]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[12]  Ashvin Goel,et al.  Database replication policies for dynamic content applications , 2006, EuroSys.

[13]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[14]  Joel L. Wolf,et al.  The placement optimization program: a practical solution to the disk file assignment problem , 1989, SIGMETRICS '89.

[15]  Murthy V. Devarakonda,et al.  Recovery in the Calypso file system , 1996, TOCS.

[16]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[17]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[18]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[19]  Alan L. Cox,et al.  Conflict-Aware Scheduling for Dynamic Content Applications , 2003, USENIX Symposium on Internet Technologies and Systems.

[20]  Divyakant Agrawal,et al.  The performance of database replication with group multicast , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[21]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[22]  Alan L. Cox,et al.  A comparative evaluation of transparent scaling techniques for dynamic content servers , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Jonathan Goldstein,et al.  MTCache: transparent mid-tier database caching in SQL server , 2004, Proceedings. 20th International Conference on Data Engineering.

[24]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[25]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[26]  Kamesh Munagala,et al.  Interaction-aware scheduling of report-generation workloads , 2011, The VLDB Journal.

[27]  Ricardo Jiménez-Peris,et al.  Adaptive Middleware for Data Replication , 2004, Middleware.

[28]  Sriram Padmanabhan,et al.  DBProxy: a dynamic data cache for web applications , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[29]  David J. DeWitt,et al.  Data placement in shared-nothing parallel database systems , 1997, The VLDB Journal.

[30]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[31]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[32]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[33]  David J. DeWitt,et al.  A performance study of three high availability data replication strategies , 2005, Distributed and Parallel Databases.

[34]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[35]  Amr El Abbadi,et al.  ElasTraS: An Elastic Transactional Data Store in the Cloud , 2009, HotCloud.

[36]  Alan L. Cox,et al.  Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites , 2003, Middleware.

[37]  Graham Kendall,et al.  Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques , 2013 .

[38]  Jeffrey F. Naughton,et al.  Middle-tier database caching for e-business , 2002, SIGMOD '02.

[39]  Martin Bichler,et al.  Capacity Planning for Virtualized Servers , 2007 .

[40]  Ashraf Aboulnaga,et al.  Deploying Database Appliances in the Cloud , 2009, IEEE Data Eng. Bull..

[41]  Ashraf Aboulnaga,et al.  Automatic virtual machine configuration for database workloads , 2008, SIGMOD Conference.

[42]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[43]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[44]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[45]  Patrick Valduriez,et al.  Parallel database systems: Open problems and new issues , 1993, Distributed and Parallel Databases.

[46]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[47]  George Candea,et al.  Middleware-based database replication: the gaps between theory and practice , 2007, SIGMOD Conference.

[48]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[49]  AlonsoGustavo,et al.  A new approach to developing and implementing eager database replication protocols , 2000 .

[50]  Rajeev Rastogi,et al.  Update propagation protocols for replicated databates , 1999, SIGMOD '99.

[51]  Fernando Pedone,et al.  Pronto: High availability for standard off-the-shelf databases , 2008, J. Parallel Distributed Comput..

[52]  Krithi Ramamritham,et al.  A Comparative Study of Alternative Middle Tier Caching Solutions to Support Dynamic Web Content Acceleration , 2001, VLDB.

[53]  Jeanna Neefe Matthews,et al.  Serverless network file systems , 1996, TOCS.

[54]  Willy Zwaenepoel,et al.  C-JDBC: Flexible Database Clustering Middleware , 2004, USENIX Annual Technical Conference, FREENIX Track.

[55]  Hamid Pirahesh,et al.  Cache Tables: Paving the Way for an Adaptive Database Cache , 2003, VLDB.

[56]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[57]  Leonie Kohl,et al.  Fundamental Concepts in the Design of Experiments , 2000 .

[58]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[59]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[60]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[61]  Jerome A. Rolia,et al.  Automating Enterprise Application Placement in Resource Utilities , 2003, DSOM.

[62]  Shamkant B. Navathe,et al.  Scheduling data redistribution in distributed databases , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[63]  Diego R. Llanos Ferraris,et al.  TPCC-UVa: an open-source TPC-C implementation for global performance measurement of computer systems , 2006, SGMD.

[64]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[65]  C. Ireland Fundamental concepts in the design of experiments , 1964 .

[66]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[67]  David B. Lomet Private locking and distributed cache management , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[68]  Ganesh Venkitachalam,et al.  The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines , 2010 .

[69]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[70]  Ricardo Jiménez-Peris,et al.  Consistent and Scalable Cache Replication for Multi-tier J2EE Applications , 2007, Middleware.

[71]  David B. Lomet Recovery for Shared Disk Systems Using Multiple Redo Logs , 2002 .

[72]  Andrew Warfield,et al.  SecondSite: disaster tolerance as a service , 2012, VEE '12.

[73]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[74]  Mary Baker,et al.  The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment , 1992, USENIX Summer.

[75]  Esther Pacitti,et al.  Fast Algorithms for Maintaining Replica Consistency in Lazy Master Replicated Databases , 1999, VLDB.

[76]  Shivnath Babu,et al.  Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[77]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[78]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[79]  Lawrence W. Dowdy,et al.  File Assignment in a Computer Network , 1981, Comput. Networks.

[80]  Heiko Schuldt,et al.  FAS - A Freshness-Sensitive Coordination Middleware for a Cluster of OLAP Components , 2002, VLDB.

[81]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[82]  Gustavo Alonso,et al.  Using Optimistic Atomic Broadcast in Transaction Processing Systems , 2003, IEEE Trans. Knowl. Data Eng..

[83]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[84]  C. Mohan,et al.  Efficient Locking and Caching of Data in the Multisystem Shared Disks Transaction Environment , 1992, EDBT.

[85]  Rui Liu,et al.  Elastic Scale-Out for Partition-Based Database Systems , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[86]  Christopher Hertel Implementing CIFS: The Common Internet File System , 2003 .

[87]  Thomas Bäck,et al.  The zero/one multiple knapsack problem and genetic algorithms , 1994, SAC '94.

[88]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[89]  Paul C. Zikopoulos,et al.  DB2: The Complete Reference , 2001 .

[90]  Tim Kraska,et al.  An evaluation of alternative architectures for transaction processing in the cloud , 2010, SIGMOD Conference.

[91]  Andrew Warfield,et al.  RemusDB: transparent high availability for database systems , 2011, The VLDB Journal.

[92]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[93]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[94]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[95]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[96]  Martin Bichler,et al.  A Mathematical Programming Approach for Server Consolidation Problems in Virtualized Data Centers , 2010, IEEE Transactions on Services Computing.

[97]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[98]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[99]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[100]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[101]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[102]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[103]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[104]  Philip A. Bernstein,et al.  Relaxed-currency serializability for middle-tier caching and replication , 2006, SIGMOD Conference.

[105]  Avishai Wool,et al.  Replication, consistency, and practicality: are these mutually exclusive? , 1998, SIGMOD '98.

[106]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[107]  Prashant Gaharwar Dynamic Storage Provisioning with SLO Guarantees , 2010 .

[108]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[109]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[110]  Kevin Loney Oracle Database 11g The Complete Reference , 2004 .

[111]  Kien A. Hua,et al.  An Adaptive Data Placement Scheme for Parallel Database Computer Systems , 1990, VLDB.

[112]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[113]  GhemawatSanjay,et al.  The Google file system , 2003 .

[114]  David A. Goldberg,et al.  Design and Implementation of the Sun Network Filesystem , 1985, USENIX Conference Proceedings.

[115]  Pascal Poupart,et al.  A bayesian approach to online performance modeling for database appliances using gaussian models , 2011, ICAC '11.

[116]  David B. Lomet,et al.  Chimera: data sharing flexibility, shared nothing simplicity , 2011, IDEAS '11.