Design and Implementation of GXP Make - A Workflow System Based on Make

This paper describes the rational behind designing workflow systems based on the Unix make by showing a number of idioms useful for workflows comprising many tasks. It also demonstrates a specific design and implementation of such a workflow system called GXP make. GXP make supports all the features of GNU make and extends its platforms from single node systems to clusters, clouds, supercomputers, and distributed systems. Interestingly, it is achieved by a very small code base that does not modify GNU make implementation at all. While being not ideal for performance, it achieved a useful performance and scalability of dispatching one million tasks in approximately 16,000 seconds (60 tasks per second, including dependence analysis) on an 8 core Intel Nehalem node. For real applications, recognition and classification of protein-protein interactions from biomedical texts on a supercomputer with more than 8,000 cores are described.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Alexander V. Veidenbaum,et al.  Innovative Architecture for Future Generation High-Performance Processors and Systems , 2003, Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003.

[3]  K. Taura GXP : An Interactive Shell for the Grid Environment , 2004, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'04).

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Douglas Thain,et al.  Parrot: Transparent User-Level Middleware for Data-Intensive Computing , 2005, Scalable Comput. Pract. Exp..

[6]  Osamu Tatebe,et al.  Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing , 2005 .

[7]  Osamu Tatebe,et al.  Pwrake: a parallel and distributed flexible workflow management tool for wide-area data intensive computing , 2010, HPDC '10.

[8]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[9]  Erez Zadok,et al.  PGMAKE: A Portable Distributed Make System , 1994 .

[10]  Akinori Yonezawa,et al.  GMount: An Ad Hoc and Locality-Aware Distributed File System by Using SSH and FUSE , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[11]  Jun'ichi Tsujii,et al.  A log-linear model with an n-gram reference distribution for accurate HPSG parsing , 2007, IWPT.

[12]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[13]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[14]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[15]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[16]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[17]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[18]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[19]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[20]  B. Folliot Implementation of a parallel and distributed MAKE on NFS with GATOS , 1990, Ninth Annual International Phoenix Conference on Computers and Communications. 1990 Conference Proceedings.

[21]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[22]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[23]  Li Yi,et al.  Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions , 2009, HPDC '09.

[24]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..