Accelerating Distributed Updates with Asynchronous Ordered Writes in a Parallel File System

Ordered writes mechanism is an efficient and widely used way to guarantee the consistency of distributed updates in a parallel file system. To keep the write order, remote commit operations should not be sent out until the local updates are forced to be stable. However, this can block the execution of applications and significantly degrade the overall performance. Thus, the I/O and network latency of commit requests serve non-negligible cost for file updates, especially for large amount of small files. In this paper, we argue that the write order keeping can be handed over from the applications to file systems i.e. the order keeping can be removed from the critical I/O path of applications. We propose the Delayed Commit Protocol that the requests of committing sub-operations are submitted to the commit queue and in the meanwhile the execution flow can be returned back to applications immediately. To reduce the total I/O and network overhead, we use space delegation and adaptive RPC (Remote Procedure Call) compound techniques. Experiments show an up to 2.6x speedup while applying such protocol in a CDN (Content Delivery Network) benchmark. No performance degradation occurs for workloads with large files or conflicted operations.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[3]  Wei Wang,et al.  MiF: Mitigating the Intra-file Fragmentation in Parallel File System , 2011, 2011 International Conference on Parallel Processing.

[4]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[5]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[6]  Jason Flinn,et al.  Speculative execution in a distributed file system , 2005, SOSP '05.

[7]  Yale N. Patt,et al.  Soft updates: a solution to the metadata update problem in file systems , 2000 .

[8]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[9]  Pete Wyckoff,et al.  File Creation Strategies in a Distributed Metadata File System , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Philip H. Carns,et al.  Using server-to-server communication in parallel file systems to simplify consistency and improve performance , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Galen M. Shipman,et al.  Efficient Object Storage Journaling in a Distributed Parallel File System , 2010, FAST.

[12]  Edward W. Felten,et al.  Archipelago: an Island-based file system for highly available and scalable internet services , 2000 .

[13]  Walter B. Ligon,et al.  Using server-to-server communication in parallel file systems to simplify consistency and improve performance , 2008, HiPC 2008.

[14]  Robert B. Ross,et al.  Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[16]  Lei Zhang,et al.  Generalized file system dependencies , 2007, SOSP.

[17]  P. Strevens Iii , 1985 .

[18]  Andrea C. Arpaci-Dusseau,et al.  Consistency without ordering , 2012, FAST.

[19]  Garth Goodson NFSv4 pNFS Extensions , 2005 .

[20]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[21]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.