Data Management Challenges in Cloud Computing Infrastructures

The challenge of building consistent, available, and scalable data management systems capable of serving petabytes of data for millions of users has confronted the data management research community as well as large internet enterprises. Current proposed solutions to scalable data management, driven primarily by prevalent application requirements, limit consistent access to only the granularity of single objects, rows, or keys, thereby trading off consistency for high scalability and availability. But the growing popularity of “cloud computing”, the resulting shift of a large number of internet applications to the cloud, and the quest towards providing data management services in the cloud, has opened up the challenge for designing data management systems that provide consistency guarantees at a granularity larger than single rows and keys. In this paper, we analyze the design choices that allowed modern scalable data management systems to achieve orders of magnitude higher levels of scalability compared to traditional databases. With this understanding, we highlight some design principles for systems providing scalable and consistent data management as a service in the cloud.

[1]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[2]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[3]  Eugene Wong,et al.  Introduction to a system for distributed databases (SDD-1) , 1980, TODS.

[4]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[5]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[6]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[7]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[8]  Laura M. Haas,et al.  Computation and communication in R*: a distributed database manager , 1984, TOCS.

[9]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[10]  Pat Helland,et al.  Life beyond Distributed Transactions: an Apostate's Opinion , 2007, CIDR.

[11]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[12]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[13]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[14]  Werner Vogels,et al.  Data Access Patterns in The Amazon.com Technology Platform , 2007, VLDB.

[15]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .

[17]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[18]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[19]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.