Necessity is the mother of invention: a simple grid computing system using commodity tools

Access to sufficient resources is a barrier to scientific progress for many researchers facing large computational problems. Gaining access to large-scale resources (i.e., university-wide or federally supported computer centers) can be difficult, given their limited availability, particular architectures, and request/review/approval cycles. Simultaneously, researchers often find themselves with access to workstations and older clusters overlooked by their owners in favor of newer hardware. Software to tie these resources into a coherent Grid, however, has been problematic. Here, we describe our experiences building a Grid computing system to conduct a large-scale simulation study using "borrowed" computing resources distributed over a wide area. Using standard software components, we have produced a Grid computing system capable of coupling several hundred processors spanning multiple continents and administrative domains. We believe that this system fills an important niche between a closely coupled local system and a heavyweight, highly customized wide area system.

[1]  Antonis Rokas,et al.  Comparing bootstrap and posterior probability values in the four-taxon case. , 2003, Systematic biology.

[2]  A. Barabasi,et al.  Parasitic computing , 2001, Nature.

[3]  M. P. Cummings,et al.  Sampling properties of DNA sequence data in phylogenetic analysis. , 1995, Molecular biology and evolution.

[4]  A. Leaché,et al.  Molecular systematics of the Eastern Fence Lizard (Sceloporus undulatus): a comparison of Parsimony, Likelihood, and Bayesian approaches. , 2002, Systematic biology.

[5]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[6]  D. Winkler,et al.  Phylogeny of the tree swallow genus, Tachycineta (Aves: Hirundinidae), by Bayesian analysis of mitochondrial DNA sequences. , 2002, Molecular phylogenetics and evolution.

[7]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[8]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[9]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[10]  M. P. Cummings,et al.  Genes and other samples of DNA sequence data for phylogenetic inference. , 1999, The Biological bulletin.

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[12]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[13]  P. Lewis,et al.  Phylogenetic systematics turns over a new leaf. , 2001, Trends in ecology & evolution.

[14]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[15]  E Lizabethhalloran Bradleyefron Bootstrap confidence levels for phylogenetic trees , 1996 .

[16]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[17]  Paul H. Harvey,et al.  New uses for new phylogenies , 1993, European Review.

[18]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[19]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .