Easy and instantaneous processing for data-intensive workflows

This paper presents a light-weight and scalable framework that enables non-privileged users to effortlessly and instantaneously describe, deploy, and execute data-intensive workflows on arbitrary computing resources from clusters, clouds, and supercomputers. This framework consists of three major components: GXP parallel/distributed shell as resource explorer and framework back-end, GMount distributed file system as underlying data sharing approach, and GXP Make as the workflow engine. With this framework, domain researchers can intuitively write workflow description in GNU make rules and harness resources from different domains with low learning and setup cost. By investigating the execution of real-world scientific applications using this framework on multi-cluster and supercomputer platforms, we demonstrate that our processing framework has practically useful performance and are suitable for common practice of data-intensive workflows in various distributed computing environments.

[1]  Douglas Thain,et al.  Chirp: a practical global filesystem for cluster and Grid computing , 2008, Journal of Grid Computing.

[2]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[3]  M. Humphrey,et al.  LegionFS: A Secure and Scalable File System Supporting Cross-Domain High-Performance Applications , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[4]  Stephen C. Simms,et al.  Wide Area Filesystem Performance using Lustre on the TeraGrid , 2007 .

[5]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[6]  Akinori Yonezawa,et al.  GMount: An Ad Hoc and Locality-Aware Distributed File System by Using SSH and FUSE , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Jun'ichi Tsujii,et al.  Design and Implementation of GXP Make - A Workflow System Based on Make , 2010, eScience.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[10]  Kenjiro Taura,et al.  Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Jeff Weber,et al.  Workflow Management in Condor , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[13]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Ashish Gehani,et al.  Performance and extension of user space file systems , 2010, SAC '10.

[15]  Douglas Thain,et al.  Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions , 2010, Cluster Computing.

[16]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[17]  Phil Andrews,et al.  Massive High-Performance Global File Systems for Grid computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[18]  K. Taura GXP : An Interactive Shell for the Grid Environment , 2004, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'04).

[19]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[20]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Osamu Tatebe,et al.  Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing , 2005 .

[23]  Benjamin Bennett,et al.  High speed bulk data transfer using the SSH protocol , 2008, Mardi Gras Conference.

[24]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..