Design of an active storage cluster file system for DAG workflows

We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Brent Welch POSIX IO extensions for HPC , 2005 .

[3]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[4]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[5]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Jarek Nieplocha,et al.  Active Storage Processing in a Parallel File System , 2006 .

[8]  Robert B. Ross,et al.  Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[10]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[11]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[12]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[13]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[14]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[15]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[16]  Douglas Thain,et al.  Chirp: a practical global filesystem for cluster and Grid computing , 2008, Journal of Grid Computing.

[17]  Sadaf R. Alam,et al.  Parallel I/O and the metadata wall , 2011, PDSW '11.

[18]  M. Livny,et al.  PARROT: AN APPLICATION ENVIRONMENT FOR DATA-INTENSIVE COMPUTING ((PREPRINT VERSION)) , 2005 .

[19]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[20]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[21]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[22]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[23]  Jarek Nieplocha,et al.  Evaluation of active storage strategies for the lustre parallel file system , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[24]  Cezary Dubnicki,et al.  HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System , 2010, FAST.

[25]  濱野 純 入門Git : The fast version control system , 2009 .

[26]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[27]  Douglas Thain,et al.  Parrot: Transparent User-Level Middleware for Data-Intensive Computing , 2005, Scalable Comput. Pract. Exp..

[28]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[29]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.