Exploiting Virtual Metadata Servers to Provide Multi-Level Consistency for Key-Value Object-Based Data Store

Distributed data store is a fundamental building block for various Internet services. For large-scale distributed data store, the scalability and consistency of metadata services are prone to be the bottleneck. Various schemes are proposed to tackle the challenge of scalability and consistency within metadata services. While centralized single-node metadata services with low scalability provide low- overhead consistency maintenance, distributed metadata servers with high scalability often suffer complicated management and high-overhead consistency maintenance. As some key-value object-based storage systems locate and access an object by hashing function (e.g., consistent hashing table), there are no dedicated physical servers for metadata services. For key-value store without dedicated metadata servers, we exploited a scheme called virtual metadata servers (virtual MDS), which can create an opportunity to provide high performance and multi- level consistency. While conventional key-value data store distributes metadata across data nodes, our scheme uses proxy nodes, where virtual disks created, as virtual MDS to hold the metadata of virtual disks. Meanwhile, we also combine the characteristic of virtual disks and metadata services to implement a multi-level consistency strategy for the key-value object-based store without dedicated physical metadata servers. With virtual MDS, we use version information to update data asynchronously and check the version consistency periodically, then correct the stale entries properly. In this way, our virtual MDS can provide multi-level of consistency to cope with different read performance demand from users. The experiment results demonstrate that our scheme with relaxed consistency can enhance random write performance by 50% and improve random read performance by 16% compared with the standard storage system with strict consistency.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[3]  Robert Latham,et al.  A next-generation parallel file system for Linux cluster. , 2004 .

[4]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[5]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[6]  Lin Xiao,et al.  ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems , 2015, SoCC.

[7]  Willy Zwaenepoel,et al.  GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks , 2014, SoCC.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Seif Haridi,et al.  HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases , 2016, FAST.

[10]  Kai Ren,et al.  TABLEFS: Enhancing Metadata Efficiency in the Local File System , 2013, USENIX Annual Technical Conference.

[11]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[12]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[13]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[14]  João Leitão,et al.  ChainReaction: a causal+ consistent datastore based on chain replication , 2013, EuroSys '13.

[15]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[16]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[17]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[18]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[19]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  João Leitão,et al.  Automating the Choice of Consistency Levels in Replicated Systems , 2014, USENIX Annual Technical Conference.

[22]  Sameh Elnikety,et al.  Orbe: scalable causal consistency using dependency matrices and physical clocks , 2013, SoCC.

[23]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[24]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[25]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Alexander I. Suciu,et al.  LOWER CENTRAL SERIES AND FREE RESOLUTIONS OF HYPERPLANE ARRANGEMENTS , 2001, math/0109070.

[27]  Ruini Xue,et al.  Replichard: Towards Tradeoff between Consistency and Performance for Metadata , 2016, ICS.

[28]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.