WaFS: A Workflow-Aware File System for Effective Storage Utilization in the Cloud

We present WaFS, a user-level file system, and a related scheduling algorithm for scientific workflow computation in the cloud. WaFS's primary design goal is to automatically detect and gather the explicit and implicit data dependencies between workflow jobs, rather than high-performance file access. Using WaFS's data, a workflow scheduler can either make effective cost-performance tradeoffs or improve storage utilization. Proper resource provisioning and storage utilization on pay-as-you-go clouds can be more cost effective than the uses of resources in traditional HPC systems. WaFS and the scheduler controls the number of concurrent workflow instances at runtime so that the storage is well used, while the total makespan (i.e., turnaround time for a workload) is not severely compromised. We describe the design and implementation of WaFS and the new workflow scheduling algorithm based on our previous work. We present empirical evidence of the acceptable overheads of our prototype WaFS and describe a simulation-based study, using representative workflows, to show the makespan benefits of our WaFS-enabled scheduling algorithm.

[1]  Ewa Deelman,et al.  Scientific workflows and clouds , 2010, ACM Crossroads.

[2]  B. Barish,et al.  LIGO and the Detection of Gravitational Waves , 1999 .

[3]  Chien-Min Wang,et al.  ASDF: An Autonomous and Scalable Distributed File System , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[4]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[5]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[7]  Thomas Sandholm,et al.  Admission Control in a Computational Market , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[8]  Andrea C. Arpaci-Dusseau,et al.  Explicit Control in the Batch-Aware Distributed File System , 2004, NSDI.

[9]  Miron Livny,et al.  Condor and the Grid , 2003 .

[10]  Yang Wang,et al.  Maximizing Active Storage Resources with Deadlock Avoidance in Workflow-Based Computations , 2013, IEEE Transactions on Computers.

[11]  Yolanda Gil,et al.  Coordinating Workflows in Shared Grid Environments , 2004 .

[12]  G. Bruce Berriman,et al.  An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2 , 2012, Journal of Grid Computing.

[13]  Yang Wang,et al.  DDS: A deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints , 2013, Parallel Comput..

[14]  Carlos Maltzahn,et al.  Richer file system metadata using links and attributes , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[15]  G. Bruce Berriman,et al.  Data Sharing Options for Scientific Workflows on Amazon EC2 , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Douglas Thain,et al.  Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids , 2012, SWEET '12.

[18]  Robert Latham,et al.  PVFS: a parallel file system , 2006, SC.

[19]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[20]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[21]  Rajkumar Buyya,et al.  A time optimization algorithm for scheduling bag-of-task applications in auction-based proportional share systems , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[22]  Osamu Tatebe,et al.  Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[23]  Daniel S. Katz,et al.  AME: an anyscale many-task computing engine , 2011, WORKS '11.

[24]  David Abramson,et al.  Scheduling Multiple Parameter Sweep Workflow Instances on the Grid , 2009, 2009 Fifth IEEE International Conference on e-Science.

[25]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[26]  Weisong Shi,et al.  An Adaptive Rescheduling Strategy for Grid Workflow Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[27]  Daniel Marcu,et al.  Machine translation in the year 2004 , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  BuyyaRajkumar,et al.  A taxonomy of scientific workflow systems for grid computing , 2005 .

[29]  Yang Wang,et al.  Dataflow detection and applications to workflow scheduling , 2011, Concurr. Comput. Pract. Exp..

[30]  GhemawatSanjay,et al.  The Google file system , 2003 .

[31]  Michael Vrable,et al.  BlueSky: a cloud-backed file system for the enterprise , 2012, FAST.

[32]  Johan Montagnat,et al.  Grid-enabled workflows for data intensive medical applications , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[33]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[34]  Daniel S. Katz,et al.  A Workflow-Aware Storage System: An Opportunity Study , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[35]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[36]  Ewa Deelman,et al.  Integration of Workflow Partitioning and Resource Provisioning , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[37]  Yang Zhang,et al.  Relative Performance of Scheduling Algorithms in Grid Environments , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[38]  Amin Vahdat,et al.  Transparent Result Caching , 1997, USENIX Annual Technical Conference.