GekkoFS - A Temporary Distributed File System for HPC Applications

We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.

[1]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[2]  Anand Sivasubramaniam,et al.  Providing tunable consistency for a parallel file store , 2005, FAST'05.

[3]  Robert Latham,et al.  Production I / O Characterization on the Cray XE 6 , 2013 .

[4]  Purushotham Bangalore,et al.  Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[5]  Robert B. Ross,et al.  Enabling NVM for Data-Intensive Scientific Services , 2016, INFLOW@OSDI.

[6]  Bo Hong,et al.  File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.

[7]  Wang Teng,et al.  An Ephemeral Burst-Buffer File System for Scientific Applications , 2016 .

[8]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[9]  Rajeev Thakur,et al.  Achievements and challenges for I/O in computational science , 2005 .

[10]  Robert Latham,et al.  Scalable I/O and analytics , 2009 .

[11]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Tony Savor,et al.  Optimizing Space Amplification in RocksDB , 2017, CIDR.

[13]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[14]  Peter J. Braam,et al.  Lustre: The intergalactic file system , 2002 .

[15]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  André Brinkmann,et al.  File System Scalability with Highly Decentralized Metadata on Independent Storage Devices , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[17]  Ken Thompson,et al.  The UNIX Time-Sharing System (Reprint). , 1983 .

[18]  Walter B. Ligon,et al.  Scalable Distributed Directory Implementation on Orange File System , 2011 .

[19]  Nicholas Mills,et al.  OrangeFS : Advancing PVFS , 2011 .

[20]  André Brinkmann,et al.  Challenges and Solutions for Tracing Storage Systems , 2018, ACM Trans. Storage.

[21]  Kai Ren,et al.  A Case for Scaling HPC Metadata Performance through De-specialization , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[22]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[23]  Robert Latham,et al.  The Impact of File Systems on MPI-IO Scalability , 2004, PVM/MPI.

[24]  Robert Latham,et al.  PVFS: a parallel file system , 2006, SC.

[25]  Robert B. Ross,et al.  CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[26]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[27]  Robert B. Ross,et al.  Mercury: Enabling remote procedure call for high-performance computing , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[28]  Alex Brooks,et al.  Argobots: A Lightweight Low-Level Threading and Tasking Framework , 2018, IEEE Transactions on Parallel and Distributed Systems.

[29]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[30]  André Brinkmann,et al.  A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Felix Wolf,et al.  Scalable massively parallel I/O to task-local files , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[32]  Jie Ma,et al.  Adaptive and scalable metadata management to support a trillion files , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[33]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[34]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[35]  Feiyi Wang,et al.  OLCF ’ s 1 TB / s , Next-Generation Lustre File System , 2013 .