MDS: In-Depth Insight

It's Big Data era. In this epoch, the terabytes of data are a dime a dozen as well as large metadata size. The large scale metadata size becomes a barrier in the era of exascale computation. However, a fine-tuning, and a well designing of metadata can enhance the performance of a file system. Therefore, the large scale metadata server (MDS) design becomes key research point now-a-days. The designing MDS becomes a prominent challenge when metadata size grows and becomes unmanageable. Besides, the metadata management becomes also a challenging task in serving very large scale metadata. The distributed Metadata Servers (dMDS) are becoming mature enough to serve huge sized metadata. In this paper, we present state-of-the-art metadata server technology. Moreover, the paper encompasses the issues, and challenges of dMDS.

[1]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[2]  Dan Pritchett,et al.  BASE: An Acid Alternative , 2008, ACM Queue.

[3]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[4]  Carlos Maltzahn,et al.  Mantle: a programmable metadata load balancer for the ceph file system , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Saswati Mukherjee,et al.  ‘MaaS’: Fast Retrieval of Data in Cloud Using Metadata as a Service , 2015 .

[6]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[7]  Kai Ren,et al.  DeltaFS: exascale file systems scale better without dedicated servers , 2015, PDSW '15.

[8]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[9]  Limin Xiao,et al.  CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[10]  Ruini Xue,et al.  Partitioner: A Distributed HDFS Metadata Server Cluster , 2014, 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[11]  Lin Xiao,et al.  ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems , 2015, SoCC.

[12]  Weigang Wu,et al.  DMooseFS: Design and implementation of distributed files system with distributed metadata server , 2012, 2012 IEEE Asia Pacific Cloud Computing Congress (APCloudCC).

[13]  Ruini Xue,et al.  Replichard: Towards Tradeoff between Consistency and Performance for Metadata , 2016, ICS.

[14]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[15]  Ripon Patgiri,et al.  Dr. Hadoop: an infinite scalable metadata management for Hadoop—How the baby elephant becomes immortal , 2016, Frontiers of Information Technology & Electronic Engineering.

[16]  Alan Fekete,et al.  Metadata-as-a-Service , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[17]  Sridhar Mahadevan,et al.  DROP: Facilitating distributed metadata management in EB-scale storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[20]  Kai Ren,et al.  BatchFS: Scaling the File System Control Plane with Client-Funded Metadata Servers , 2014, 2014 9th Parallel Data Storage Workshop.

[21]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[22]  B. Prabavathy,et al.  A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System , 2013, 2013 International Conference on Computer Communication and Informatics.

[23]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[24]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[25]  Scott A. Brandt,et al.  Ceph: reliable, scalable, and high-performance distributed storage , 2007 .

[26]  Daniel J. Abadi,et al.  CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems , 2015, FAST.

[27]  Saswati Mukherjee,et al.  ‘MaaS’: fast retrieval of E-file in cloud using metadata as a service , 2017, J. Intell. Manuf..

[28]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Ripon Patgiri,et al.  HAR+: Archive and metadata distribution! Why not both? , 2015, 2015 International Conference on Computer Communication and Informatics (ICCCI).

[30]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.