A Survey on Distributed File System Technology

Distributed file systems provide a fundamental abstraction to location-transparent, permanent storage. They allow distributed processes to co-operate on hierarchically organized data beyond the life-time of each individual process. The great power of the file system interface lies in the fact that applications do not need to be modified in order to use distributed storage. On the other hand, the general and simple file system interface makes it notoriously difficult for a distributed file system to perform well under a variety of different workloads. This has lead to today's landscape with a number of popular distributed file systems, each tailored to a specific use case. Early distributed file systems merely execute file system calls on a remote server, which limits scalability and resilience to failures. Such limitations have been greatly reduced by modern techniques such as distributed hash tables, content-addressable storage, distributed consensus algorithms, or erasure codes. In the light of upcoming scientific data volumes at the exabyte scale, two trends are emerging. First, the previously monolithic design of distributed file systems is decomposed into services that independently provide a hierarchical namespace, data access, and distributed coordination. Secondly, the segregation of storage and computing resources yields to a storage architecture in which every compute node also participates in providing persistent storage.

[1]  Ian T. Foster,et al.  Making a case for distributed file systems at Exascale , 2011, LSAP '11.

[2]  Shobhit Dayal,et al.  Characterizing HEC Storage Systems at Rest , 2008 .

[3]  Adriana Iamnitchi,et al.  File grouping for scientific data management: lessons from experimenting with real traces , 2008, HPDC '08.

[4]  Kendy Kutzner,et al.  The decentralized file system Igor-FS as an application for overlay-networks , 2008 .

[5]  Predrag Buncic,et al.  Distributing LHC application software and conditions databases using the CernVM file system , 2011 .

[6]  Scott A. Brandt,et al.  Ceph: reliable, scalable, and high-performance distributed storage , 2007 .

[7]  Mahadev Satyanarayanan,et al.  A SURVEY OF DISTRIBUTED FILE SYSTEMS , 1990 .

[8]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[9]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[10]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[11]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[12]  Patrick Fuhrmann,et al.  dCache, Storage System for the Future , 2006, Euro-Par.

[13]  Piyush Agarwal,et al.  A Survey of Secure , Fault-tolerant Distributed File Systems , 2004 .

[14]  Siegfried Benkner,et al.  SCALABLE COMPUTING Practice and Experience , 2008 .

[15]  Andreas J. Peters,et al.  Exabyte Scale Storage at CERN , 2011 .

[16]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[17]  Benjamin Depardon,et al.  Analysis of Six Distributed File Systems , 2013 .

[18]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[19]  Michael N. Nelson,et al.  Caching in the Sprite network file system , 1988, TOCS.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.