Evaluating and Optimizing Indexing Schemes for a Cloud-Based Elastic Key-Value Store

Cloud computing has emerged to provide virtual, pay-as-you-go computing and storage services over the Internet, where the usage cost directly depends on consumption. One compelling feature in Clouds is elasticity, where a user can demand, and be immediately given access to, more (or less) resources based on requirements. However, this feature introduces new challenges in developing application and services. In this paper, we focus on the challenges in data management in Cloud environments, in view of elasticity. Particularly, we consider an elastic key-value store, which is used to cache intermediate results in a service-oriented system, and accelerate future queries by reusing the stored values. Such a key-value store can clearly benefit from the elasticity offered by Clouds, by expanding the cache during query-intensive periods. However, supporting an elastic key-value store involves many challenges, including selecting an appropriate indexing scheme, data migration upon elastic resource provisioning, and optimizations to remove certain overheads in the Cloud. This paper focuses on the design of an elastic key-value store. We consider three ubiquitous methods for indexing: B+-Trees, Extendible Hashing, and Bloom Filters, and we show how these schemes can be modified to exploit elasticity in Clouds. We also evaluate various performance aspects associated with the use of these indexing schemes. Furthermore, we have developed a heuristic to request elastic compute resources for expanding the cache such that instance startup overheads are minimized in our scheme. Our evaluation studies show that the index selection depends on various application and system level parameters that we have identified. And while we confirm that B+-Trees, which pervade many of today's key-value systems, would scale well, we showcases when Extendible Hashing would outperform B+-Trees.

[1]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[2]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[3]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[4]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[5]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[6]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[7]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[8]  Richard J. Enbody,et al.  Dynamic hashing schemes , 1988, CSUR.

[9]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[10]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[11]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[12]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[13]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[14]  Gagan Agrawal,et al.  Elastic Cloud Caches for Accelerating Service-Oriented Computations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[16]  Ramez Elmasri,et al.  Fundamentals of Database Systems, 5th Edition , 2006 .

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[19]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[20]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[21]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[22]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[23]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[24]  Michael Mitzenmacher,et al.  Less hashing, same performance: Building a better Bloom filter , 2006, Random Struct. Algorithms.

[25]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[26]  Amr El Abbadi,et al.  ElasTraS: An Elastic Transactional Data Store in the Cloud , 2009, HotCloud.