Swift: Fast, Reliable, Loosely Coupled Parallel Computation

We present Swift, a system that combines a novel scripting language called SwiftScript with a powerful runtime system based on CoG Karajan, Falkon, and Globus to allow for the concise specification, and reliable and efficient execution, of large loosely coupled computations. Swift adopts and adapts ideas first explored in the GriPhyN virtual data system, improving on that system in many regards. We describe the SwiftScript language and its use of XDTM to describe the logical structure of complex file system structures. We also present the Swift runtime system and its use of CoG Karajan, Falkon, and Globus services to dispatch and manage the execution of many tasks in parallel and grid environments. We describe application experiences and performance experiments that quantify the cost of Swift operations.

[1]  James Annis et al. Applying chimera virtual data concepts to cluster finding in the Sloan Sky Survey , 2002 .

[2]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[3]  Yong Zhao,et al.  A notation and system for expressing and executing cleanly typed workflows on messy scientific data , 2005, SGMD.

[4]  Alex Rodriguez,et al.  Using multiple grid resources for bioinformatics applications in GADU , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[5]  Ian T. Foster,et al.  Automating Climate Science: Large Ensemble Simulations on the TeraGrid with the GriPhyN Virtual Data System , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[6]  Michael Wilde,et al.  Kickstarting remote applications , 2006 .

[7]  Yong Zhao,et al.  XDTM: The XML Data Type and Mapping for Specifying Datasets , 2005, EGC.

[8]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[9]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[10]  Gregor von Laszewski,et al.  Java CoG Kit Workflow , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[11]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Yong Zhao,et al.  Dynamic Resource Provisioning in Grid Environments , 2007 .

[14]  Ian J. Taylor,et al.  Visual Grid Workflow in Triana , 2005, Journal of Grid Computing.

[15]  Yong Zhao,et al.  Applying the Virtual Data Provenance Model , 2006, IPAW.

[16]  Matjaz B. Juric,et al.  Business process execution language for web services , 2004 .

[17]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..