Data Management in the Cloud: Challenges and Opportunities

Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications that are being deployed in various cloud platforms. There has also been an increase in the scale of the data generated as well as consumed by such applications. Scalable database management systems form a critical part of the cloud infrastructure. The attempt to address the challenges posed by the management of big data has led to a plethora of systems. This book aims to clarify some of the important concepts in the design space of scalable data management in cloud computing infrastructures. Some of the questions that this book aims to answer are: the appropriate systems for a specific set of application requirements, the research challenges in data management for the cloud, and what is novel in the cloud for database researchers? We also aim to address one basic question: whether cloud computing poses new challenges in scalable data management or it is just a reincarnation of old problems? We provide a comprehensive background study of state-of-the-art systems for scalable data management and analysis. We also identify important aspects in the design of different systems and the applicability and scope of these systems. A thorough understanding of current solutions and a precise characterization of the design space are essential for clearing the "cloudy skies of data management" and ensuring the success of DBMSs in the cloud, thus emulating the success enjoyed by relational databases in traditional enterprise settings. Table of Contents: Introduction / Distributed Data Management / Cloud Data Management: Early Trends / Transactions on Co-located Data / Transactions on Distributed Data / Multi-tenant Database Systems / Concluding Remarks

[1]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[2]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[3]  Hakan Hacigümüs,et al.  Microsharding: a declarative approach to support elastic OLTP workloads , 2012, OPSR.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[6]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[7]  Ernest J. H. Chang,et al.  An improved algorithm for decentralized extrema-finding in circular configurations of processes , 1979, CACM.

[8]  Tim Kraska,et al.  Building a database on S3 , 2008, SIGMOD Conference.

[9]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[10]  Chandra Krintz,et al.  AppScale: Scalable and Open AppEngine Application Development and Deployment , 2009, CloudComp.

[11]  Mohamed F. Mokbel,et al.  Deuteronomy: Transaction Support for Cloud Data , 2011, CIDR.

[12]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[13]  Prashant J. Shenoy,et al.  "Cut me some slack": latency-aware live migration for databases , 2012, EDBT '12.

[14]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[15]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[16]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[17]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[18]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[19]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[20]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[21]  Amr El Abbadi,et al.  ElasTraS: An Elastic Transactional Data Store in the Cloud , 2009, HotCloud.

[22]  Badrish Chandramouli,et al.  A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service , 2013, SIGMOD '13.

[23]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[24]  Arnold L. Rosenberg,et al.  Application Placement on a Cluster of Servers , 2007, Int. J. Found. Comput. Sci..

[25]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[26]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[27]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[28]  Michael Stonebraker,et al.  A Formal Model of Crash Recovery in a Distributed System , 1983, IEEE Transactions on Software Engineering.

[29]  Danny Dolev,et al.  The Byzantine Generals Strike Again , 1981, J. Algorithms.

[30]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[31]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[32]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[33]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[34]  Henri E. Bal,et al.  An efficient reliable broadcast protocol , 1989, OPSR.

[35]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[36]  Torsten Grust,et al.  Multi-tenant databases for software as a service: schema-mapping techniques , 2008, SIGMOD Conference.

[37]  Yun Chi,et al.  CloudDB: One Size Fits All Revived , 2010, 2010 6th World Congress on Services.

[38]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[39]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[40]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[41]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[42]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[43]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[44]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[45]  Werner Vogels,et al.  Data Access Patterns in The Amazon.com Technology Platform , 2007, VLDB.

[46]  Philip A. Bernstein,et al.  Implementing an Append-Only Interface for Semiconductor Storage , 2010, IEEE Data Eng. Bull..

[47]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[48]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[49]  Shyam Antony,et al.  Data Management Challenges in Cloud Computing Infrastructures , 2010, DNIS.

[50]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[51]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[52]  Dorit S. Hochbaum,et al.  Polynomial algorithm for the k-cut problem , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[53]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[54]  Sergio Rajsbaum ACM SIGACT news distributed computing column 5 , 2001, SIGA.

[55]  Marc Shapiro,et al.  Conflict-Free Replicated Data Types , 2011, SSS.

[56]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[57]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[58]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[59]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[60]  Divyakant Agrawal,et al.  Scalable and elastic transactional data stores for cloud computing platforms , 2011 .

[61]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[62]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[63]  Irfan Ahmad,et al.  BASIL: Automated IO Load Balancing Across Storage Devices , 2010, FAST.

[64]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[65]  Mamoru Maekawa,et al.  A Square Root N Algorithm for Mutual Exclusion in Decentralized Systems , 1985, ACM Trans. Comput. Syst..

[66]  Philip A. Bernstein,et al.  Adapting microsoft SQL server for cloud computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[67]  Carlo Curino,et al.  Workload-aware database monitoring and consolidation , 2011, SIGMOD '11.

[68]  Mohamed F. Mokbel,et al.  Locking Key Ranges with Unbundled Transaction Services , 2009, Proc. VLDB Endow..

[69]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[70]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[71]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[72]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[73]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[74]  Hui Ding,et al.  TAO: how facebook serves the social graph , 2012, SIGMOD Conference.

[75]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[76]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[77]  Patrick Valduriez,et al.  Principles of Distributed Database Systems, Third Edition , 2011 .

[78]  Nancy A. Lynch,et al.  Perspectives on the CAP Theorem , 2012, Computer.

[79]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[80]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[81]  Dean Jacobs,et al.  Ruminations on Multi-Tenant Databases , 2007, BTW.

[82]  Bengt Carlsson,et al.  The Rise and Fall of Napster - An Evolutionary Approach , 2001, Active Media Technology.

[83]  Craig D. Weissman,et al.  The design of the force.com multitenant internet application development platform , 2009, SIGMOD Conference.

[84]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[85]  Philip A. Bernstein,et al.  Principles of Transaction Processing , 1996 .

[86]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[87]  Michael Stonebraker,et al.  One Size Fits All? Part 2: Benchmarking Studies , 2007, CIDR.

[88]  Divyakant Agrawal,et al.  Serializability, not Serial: Concurrency Control and Availability in Multi-Datacenter Datastores , 2012, Proc. VLDB Endow..

[89]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[90]  Michael A. Duggan,et al.  Data bases , 1970, ACM '70.

[91]  Philip A. Bernstein,et al.  Optimistic concurrency control by melding trees , 2011, Proc. VLDB Endow..

[92]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[93]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[94]  Dahlia Malkhi,et al.  CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.

[95]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[96]  Xin Chen,et al.  F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[97]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[98]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[99]  Carlo Curino,et al.  Relational Cloud: a Database Service for the cloud , 2011, CIDR.

[100]  Ashraf Aboulnaga,et al.  Automatic virtual machine configuration for database workloads , 2008, SIGMOD Conference.

[101]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[102]  Armando Fox,et al.  HiLighter: Automatically Building Robust Signatures of Performance Behavior for Small- and Large-Scale Systems , 2008, SysML.

[103]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .