GEL: Grid execution language

We consider the problem of programming parallel applications for a Grid environment, in the presence of the two main challenges (i) high-latency communications and (ii) heterogeneity. We describe a new scripting language, GEL, whose semantics have been designed for execution on a heterogeneous, distributed computer. The language provides syntactic constructs for while loops, conditionals and explicitly parallel execution. The language is designed to work well given these two challenges, and to allow succinct representation of parallel programs, resulting in easier-to-maintain code. The programs can use legacy applications without re-engineering, and do not explicitly refer to resource names or use middleware-specific references. This middleware-independence allows us to execute the same script on an SMP machine, cluster or Grid. We describe three example applications written in GEL: an optimisation problem solved using a swarm algorithm; an allergenicity prediction pipeline; and transcript analysis for tissue-specific gene expression. We have run these scripts unchanged on an SMP machine, on PBS, SGE and LSF clusters, and on a Globus-based Grid.

[1]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[2]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[3]  Ian T. Foster,et al.  The virtual data grid: a new model and architecture for data-intensive collaboration , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[4]  Francine Berman,et al.  New Grid Scheduling and Rescheduling Methods in the GrADS Project , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[6]  David Abramson,et al.  Nimrod: a tool for performing parametrised simulations using distributed workstations , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[7]  Robin Milner,et al.  A Calculus of Mobile Processes, II , 1992, Inf. Comput..

[8]  Simon L. Peyton Jones,et al.  Report on the programming language Haskell: a non-strict, purely functional language version 1.2 , 1992, SIGP.

[9]  Tapabrata Ray,et al.  An Evolutionary Algorithm for Constrained Optimization , 2000, GECCO.

[10]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[11]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[12]  Arun Krishnan,et al.  Predicting allergenic proteins using wavelet transform , 2004, Bioinform..

[13]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[14]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[15]  Arun Krishnan,et al.  Implementing a Bioinformatics Workflow in a Parallel and Distributed Environment , 2004, PDCAT.

[16]  Robin Milner,et al.  A Calculus of Mobile Processes, II , 1992, Inf. Comput..

[17]  David Abramson,et al.  Interprocess Communication in GriddLeS : Grid Enabling Legacy Software , 2003 .

[18]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[19]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[20]  Arun Krishnan,et al.  Rapid detection of conserved regions in protein sequences using wavelets , 2004, Silico Biol..

[21]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[22]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[23]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[24]  Carmen G. Moles,et al.  Parameter estimation in biochemical pathways: a comparison of global optimization methods. , 2003, Genome research.

[25]  Joseph A. Young Proceedings of the APAC Conference and Exhibition on Advanced Computing, Grid Applications and eResearch (APAC'03) , 2003 .

[26]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[27]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[28]  Robin Milner,et al.  Definition of standard ML , 1990 .

[29]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[30]  William E. Johnston,et al.  Grids as production computing environments: the engineering aspects of NASA's Information Power Grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).