A Taxonomy of Distributed Storage Systems

Revision : 1.148 This paper presents a taxonomy of key topi s e e ting resear h and development of distributed storage systems. The taxonomy nds distributed storage systems to o er a wide array of fun tionality, employ ar hite tures with varying degrees of entralisation and operate a ross environments with varying trust and s alability. Furthermore, taxonomies on autonomi management, federation, onsisten y and routing provide an insight into hallenges fa ed by distributed storage systems and the resear h to over ome them. The paper ontinues by providing a survey of distributed storage systems whi h exemplify topi s overed in the taxonomy. The sele tion of surveyed systems overs a variety of storage systems, exposing the reader to an array of di erent problems and solutions employed to over ome these hallenges. For ea h surveyed system we address the underlying operational behaviour, leading into the ar hite ture and algorithms employed in the design and development of the system. Our survey overs systems from the past and present on luding with a dis ussion on the evolution of distributed storage systems and possible future work.

[1]  Miguel Castro,et al.  HAC: hybrid adaptive caching for distributed storage systems , 1997, SOSP.

[2]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[3]  Mahadev Satyanarayanan,et al.  An empirical study of a wide-area distributed file system , 1996, TOCS.

[4]  D. DeFigueiredo,et al.  Analysis of Peer-to-Peer Network Security using Gnutella , 2002 .

[5]  Using Data Mining for Discovering Patterns in Autonomic Storage Systems , 2003 .

[6]  Rajkumar Buyya,et al.  Storage Exchange: A Global Trading Platform for Storage Services , 2006, Euro-Par.

[7]  Mahadev Satyanarayanan,et al.  The Influence of Scale on Distributed File System Design , 1992, IEEE Trans. Software Eng..

[8]  Mao Yang,et al.  An Empirical Study of Free-Riding Behavior in the Maze P2P File-Sharing System , 2005, IPTPS.

[9]  Rajmohan Rajaraman,et al.  Accessing Nearby Copies of Replicated Objects in a Distributed Environment , 1997, SPAA '97.

[10]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[11]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[12]  Wayne Schroeder The SDSC encryption/authentication (SEA) system , 1999, Concurr. Pract. Exp..

[13]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[14]  Hector Garcia-Molina,et al.  Identity crisis: anonymity vs reputation in P2P systems , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[15]  Robert Tappan Morris,et al.  Tarzan: a peer-to-peer anonymizing network layer , 2002, CCS '02.

[16]  Ben Y. Zhao,et al.  Towards a Common API for Structured Peer-to-Peer Overlays , 2003, IPTPS.

[17]  Michael Stonebraker,et al.  Locking granularity revisited , 1979, ACM Trans. Database Syst..

[18]  Gernot Heiser,et al.  The Mungi Single-Address-Space Operating System , 1994, Softw. Pract. Exp..

[19]  R. Dingledine,et al.  Reputation in P2P Anonymity Systems , 2003 .

[20]  Steven A. Moyer,et al.  PIOUS: a scalable parallel I/O system for distributed computing environments , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[22]  David Kotz,et al.  The galley parallel file system , 1997, ICS '96.

[23]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[24]  Mahadev Satyanarayanan,et al.  An Empirical Study of a Highly Available File System , 1994, SIGMETRICS.

[25]  Ion Stoica,et al.  Quantifying Disincentives in Peer-to-Peer Networks , 2009 .

[26]  William Yurcik,et al.  A survey of peer-to-peer storage techniques for distributed file systems , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[27]  Yuanyuan Zhou,et al.  Mining block correlations to improve storage performance , 2005, TOS.

[28]  Richard Wolski,et al.  G-commerce: market formulations controlling resource allocation on the computational grid , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[29]  Nancy A. Lynch,et al.  Atomic Data Access in Distributed Hash Tables , 2002, IPTPS.

[30]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[31]  Koen Holtman,et al.  CMS Data Grid System Overview and Requirements , 2001 .

[32]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[33]  Roger Dingledine,et al.  The Free Haven Project: Distributed Anonymous Storage Service , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[34]  Andy Oram,et al.  Peer-to-Peer: Harnessing the Power of Disruptive Technologies , 2001 .

[35]  Aviel D. Rubin,et al.  Publius: a robust, tamper-evident, censorship-resistant web publishing system , 2000 .

[36]  Marvin Theimer,et al.  The Bayou Architecture: Support for Data Sharing Among Mobile Users , 1994, 1994 First Workshop on Mobile Computing Systems and Applications.

[37]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[38]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[39]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[40]  Antony I. T. Rowstron,et al.  PAST: a large-scale, persistent peer-to-peer storage utility , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[41]  Rajkumar Buyya,et al.  Economic-based Distributed Resource Management and Scheduling for Grid Computing , 2002, ArXiv.

[42]  Murthy V. Devarakonda,et al.  A toolkit-based approach to policy-managed storage , 2003, Proceedings POLICY 2003. IEEE 4th International Workshop on Policies for Distributed Systems and Networks.

[43]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[44]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[45]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[46]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[47]  Rajkumar Buyya,et al.  Peer-to-Peer Networks for Content Sharing , 2005 .

[48]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[49]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[50]  George Danezis,et al.  Mixminion: design of a type III anonymous remailer protocol , 2003, 2003 Symposium on Security and Privacy, 2003..

[51]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[52]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[53]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[54]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[55]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[56]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[57]  Paul F. Syverson,et al.  Anonymous connections and onion routing , 1998, IEEE J. Sel. Areas Commun..

[58]  Arkady B. Zaslavsky,et al.  Adaptable consistency control mechanism for a mobility enabled file system , 2002, Proceedings Third International Conference on Mobile Data Management MDM 2002.

[59]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[60]  Theoni Pitoura,et al.  Towards a Unifying Framework for Complex Query Processing over Structured Peer-to-Peer Data Networks , 2003, DBISP2P.

[61]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[62]  Harjinder S. Sandhu,et al.  A Case Study of File System Workload in a Large-Scale Distributed Environment , 1994, SIGMETRICS.

[63]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[64]  Geoff Coulson,et al.  Free Riding on Gnutella Revisited: The Bell Tolls? , 2005, IEEE Distributed Syst. Online.

[65]  Paul P. Maglio,et al.  System administrators are users, too: designing workspaces for managing internet-scale systems , 2003, CHI Extended Abstracts.

[66]  Ian Clarke,et al.  Freenet: A Distributed Anonymous Information Storage and Retrieval System , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[67]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[68]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[69]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[70]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[71]  Bryce Wilcox-O'Hearn,et al.  Experiences Deploying a Large-Scale Emergent Network , 2002, IPTPS.

[72]  Mahadev Satyanarayanan,et al.  Scalable, secure, and highly available distributed file access , 1990, Computer.

[73]  Abraham Silberschatz,et al.  Distributed file systems: concepts and examples , 1990, CSUR.

[74]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[75]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[76]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[77]  Rüdiger Schollmeier,et al.  A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[78]  Roger Dingledine,et al.  The Free Haven Project : design and deployment of an anonymous secure data haven , 2000 .

[79]  Roger Wattenhofer,et al.  Clippee: A large-scale client/peer system , 2003 .

[80]  Robert Tappan Morris,et al.  Security Considerations for Peer-to-Peer Distributed Hash Tables , 2002, IPTPS.

[81]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[82]  Kurt Tutschku,et al.  A Measurement-Based Traffic Profile of the eDonkey Filesharing Service , 2004, PAM.

[83]  Steffen Staab,et al.  Neurons, Viscose Fluids, Freshwater Polyp Hydra-and Self-Organizing Information Systems , 2003, IEEE Intell. Syst..

[84]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[85]  Steven Hazel,et al.  Achord: A Variant of the Chord Lookup Service for Use in Censorship Resistant Peer-to-Peer Publishing Systems , 2002 .

[86]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[87]  Philip A. Bernstein,et al.  The failure and recovery problem for replicated databases , 1983, PODC '83.

[88]  Hector Garcia-Molina,et al.  Peer-to-peer data trading to preserve information , 2002, TOIS.

[89]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[90]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[91]  James F. Doyle,et al.  Peer-to-Peer: harnessing the power of disruptive technologies , 2001, UBIQ.

[92]  Ieee Standards Board System application program interface (API) (C language) , 1990 .

[93]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[94]  Reagan Moore,et al.  MySRB & SRB: Components of a Data Grid , 2002 .

[95]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[96]  Donald F. Ferguson,et al.  Economic models for allocating resources in computer systems , 1996 .

[97]  Christian Damsgaard Jensen,et al.  Cryptographic access control in a distributed file system , 2003, SACMAT '03.

[98]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[99]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[100]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[101]  GhemawatSanjay,et al.  The Google file system , 2003 .

[102]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[103]  Mahadev Satyanarayanan,et al.  A SURVEY OF DISTRIBUTED FILE SYSTEMS , 1990 .

[104]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[105]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[106]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[107]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[108]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[109]  Michael Stonebraker,et al.  Effects of locking granularity in a database management system , 1977, TODS.

[110]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[111]  Stephen L. Scott,et al.  FreeLoader: Scavenging Desktop Storage Resources for Scientific Data , 2005, ACM/IEEE SC 2005 Conference (SC'05).