Distributed access to parallel file systems

Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/O throughput to large data stores, but are limited to particular operating system and hardware platforms, lack seamless integration and modern security features, and suffer from slow offsite performance. Meanwhile, advanced research collaborations are requiring higher bandwidth as well as concurrent and secure access to large datasets across myriad platforms and parallel file systems, forming a schism between file systems and their users. It is my thesis that a distributed file system can improve I/O throughput to modern parallel file system architectures, achieving new levels of scalability, performance, security, heterogeneity, transparency, and independence. This dissertation describes and examines prototypes of three data access architectures that use the NFSv4 distributed filing protocol as a foundation for remote data access to parallel file systems while maintaining file system independence. The first architecture, Split-Server NFSv4, targets parallel file system architectures that disallow customization and/or direct storage access. Split-Server NFSv4 distributes I/O across the available parallel file system nodes, offering secure, heterogeneous, and transparent remote data access. While scalable, the Split-Server NFSv4 prototype demonstrates that the absence of direct data access limits I/O throughput. Remote data access performance can be increased for parallel file system architectures that allow direct data access plus some customization. The second architecture analyzes the pNFS protocol, which uses storage-specific layout drivers to distribute I/O across the bisectional bandwidth of a storage network between filing nodes and storage. Storage-specific layout drivers allow universal storage protocol support and flexible security and data access semantics, but can diminish the level of heterogeneity and transparency. The third architecture, Direct-pNFS, uses a commodity distributed file system for direct access to a parallel file system's storage nodes, bridging the gap between performance and transparency. The dissertation describes the importance and necessity for both direct data access architectures depending on user and system requirements. I analyze prototypes of both direct data access architectures and demonstrate their ability to match and even exceed the performance of the underlying parallel file system.

[1]  Mahadev Satyanarayanan,et al.  A study of file sizes and functional lifetimes , 1981, SOSP.

[2]  Paul J. Leach,et al.  UIDs as internal names in a distributed file system , 1982, PODC '82.

[3]  O.G. Johnson,et al.  Three-dimensional wave equation computations on vector computers , 1984, Proceedings of the IEEE.

[4]  J. Postel,et al.  File transfer protocol (FTP) , 1985 .

[5]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[6]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[7]  Paul J. Leach,et al.  The file system of an integrated local network , 1985, CSC '85.

[8]  Bruce J. Walker,et al.  The LOCUS Distributed System Architecture , 1986 .

[9]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[10]  Steve R. Kleiman,et al.  Vnodes: An Architecture for Multiple File System Types in Sun UNIX , 1986, USENIX Summer.

[11]  Paul H. Levine,et al.  The Apollo DOMAIN Distributed File System , 1987 .

[12]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[13]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[14]  Sam Coleman,et al.  Physical volume repository , 1988, Digest of Papers Ninth IEEE Symposium on Mass Storage Systems, 1988. 'Storage Systems: Perspectives'.

[15]  Jeffrey I. Schiller,et al.  An Authentication Service for Open Network Systems. In , 1998 .

[16]  Stephen W. Miller,et al.  A Reference Model for Mass Storage Systems , 1988, Adv. Comput..

[17]  Michael N. Nelson,et al.  Caching in the Sprite network file system , 1988, TOCS.

[18]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[19]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[20]  Gerd Keiser,et al.  Local Area Networks , 1989 .

[21]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[22]  Jeffrey C. Mogul,et al.  Spritely NFS: experiments with cache-consistency protocols , 1989, SOSP '89.

[23]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[24]  Sailesh Chutani,et al.  DEcorum File System Architectural Overview , 1990, USENIX Summer.

[25]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[26]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[27]  Randy H. Katz,et al.  Input/output behavior of supercomputing applications , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[28]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[29]  Sailesh Chutani,et al.  The Episode File System , 1992 .

[30]  R. S. Cornelius,et al.  High-performance switching with fibre channel , 1992, Digest of Papers COMPCON Spring 1992.

[31]  Alok N. Choudhary,et al.  Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.

[32]  Robert A. Coyne,et al.  An introduction to the Mass Storage System Reference Model, version 5 , 1993, [1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems.

[33]  Satinder Singh,et al.  The Autofs Automounter , 1993, USENIX Summer.

[34]  R. A. Coyne,et al.  The high performance storage system , 1993, Supercomputing '93.

[35]  David Kotz,et al.  Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.

[36]  Gene H. Kim,et al.  Bigfoot-NFS : A Parallel File-Striping NFS Server ( Extended Abstract ) , 1994 .

[37]  George C. Polyzos,et al.  Dynamic I/O characterization of I/O intensive scientific applications , 1994, Proceedings of Supercomputing '94.

[38]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[39]  Parris M. Caulk,et al.  The design of a petabyte archive and distribution system for the NASA ECS project , 1994 .

[40]  Darrell D. E. Long,et al.  Swift/RAID: A Distributed RAID System , 1994, Comput. Syst..

[41]  Chet Juszczak,et al.  Improving the Write Performance of an NFS Server , 1994, USENIX Winter.

[42]  Rick Macklem,et al.  Not Quite NFS, Soft Cache Consistency for NFS , 1994, USENIX Winter.

[43]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[44]  Garth A. Gibson,et al.  RAID-II: a high-bandwidth network file server , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[45]  M. Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[46]  Carla Schlatter Ellis,et al.  Characterizing parallel file-access patterns on a large-scale multiprocessor , 1995, IPPS.

[47]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[48]  Raj Srinivasan,et al.  XDR: External Data Representation Standard , 1995, RFC.

[49]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[50]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[51]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[52]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[53]  John Linn,et al.  The Kerberos Version 5 GSS-API Mechanism , 1996, RFC.

[54]  Andrew A. Chien,et al.  I/O requirements of scientific applications: an evolutionary view , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[55]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[56]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[57]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[58]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[59]  Jeanna Neefe Matthews,et al.  Serverless network file systems , 1996, TOCS.

[60]  Matthew T. O'Keefe,et al.  The Global File System , 1996 .

[61]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[62]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[63]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[64]  Evgenia Smirni,et al.  Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[65]  Lin Ling,et al.  RPCSEC_GSS Protocol Specification , 1997, RFC.

[66]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[67]  Darryl Strauss,et al.  Linux Helps Bring Titanic to Life , 1998 .

[68]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[69]  John D. Blair Samba: Integrating Unix and Windows , 1998 .

[70]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[71]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[72]  Brent Callaghan,et al.  NFS Illustrated , 1999 .

[73]  Margo I. Seltzer,et al.  Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems , 2000, USENIX Annual Technical Conference, General Track.

[74]  Mike Eisler,et al.  LIPKEY - A Low Infrastructure Public Key Mechanism Using SPKM , 2000, RFC.

[75]  Michael Stumm,et al.  Disk-striping scalability in the Exedra media server , 2000, IS&T/SPIE Electronic Imaging.

[76]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[77]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[78]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[79]  J. Behnke,et al.  EOSDIS: Archive and Distribution Systems in the Year 2000 , 2000 .

[80]  Leonid Oliker,et al.  ESP: A System Utilization Benchmark , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[81]  Andrew S. Grimshaw,et al.  LegionFS: A Secure and Scalable File System Supporting Cross-Domain High-Performance Applications , 2001, International Conference on Software Composition.

[82]  Florin Isaila,et al.  Clusterfile: a flexible physical layout parallel file system , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[83]  Brian Randell,et al.  The newcastle connection or unixes of the world unite , 2001 .

[84]  Bin Jia,et al.  MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[85]  Yves Denneulin,et al.  nfsp: a distributed NFS server for clusters of workstations , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[86]  Robert B. Ross,et al.  Noncontiguous I/O through PVFS , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[87]  Satoshi Matsuoka,et al.  Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[88]  Anupam Bhide,et al.  File Virtualization with DirectNFS , 2002 .

[89]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[90]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[91]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[92]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[93]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[94]  Dragan Nikolik,et al.  Wide Area Networks , 2003 .

[95]  G. Vogel,et al.  Deferring Competition, Global Net Closes In on SARS , 2003, Science.

[96]  Mitsuo Yokokawa,et al.  The Earth Simulator system , 2003 .

[97]  GhemawatSanjay,et al.  The Google file system , 2003 .

[98]  Jesús Carretero,et al.  The Design of the Expand Parallel File System , 2003, Int. J. High Perform. Comput. Appl..

[99]  Julian Satran,et al.  Design of the iSCSI protocol , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[100]  Y. Charlie Hu,et al.  Kosha: A Peer-to-Peer Enhancement for the Network File System , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[101]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[102]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[103]  Adrien Lebre,et al.  Performance evaluation of a prototype distributed NFS server , 2004 .

[104]  Bo Hong,et al.  File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.

[105]  Peter Honeyman,et al.  Exporting storage systems in a scalable manner with pNFS , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[106]  Harvey B Newman,et al.  The UltraLight Project: The Network as an Integrated and Managed Resource in Grid Systems for High Energy Physics and Data Intensive Science , 2005 .

[107]  Jeanne Behnke,et al.  EOSDIS petabyte archives: tenth anniversary , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[108]  Brent Welch,et al.  Object-based pNFS Operations , 2005 .

[109]  Phil Andrews,et al.  Marching Towards Nirvana: Configurations for Very High Performance Parallel File Systems , 2006, 2006 IEEE International Conference on Cluster Computing.

[110]  L. Evans The Large Hadron Collider , 2007 .

[111]  T. Wallace Development and Management , 2006 .

[112]  Phil Andrews,et al.  Design, implementation, and production experiences of a global storage grid , 2006 .

[113]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[114]  Wu-chun Feng,et al.  Exploring I/O Strategies for Parallel Sequence-Search Tools with S3aSim , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[115]  Randall R. Stewart,et al.  Stream Control Transmission Protocol , 2000, RFC.

[116]  Robert Thurlow,et al.  RPC: Remote Procedure Call Protocol Specification Version 2 , 2009, RFC.

[117]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.