A Hybrid Computational Grid Architecture for Comparative Genomics

Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved among species, as well as genes that give each organism its unique characteristics. However, the huge datasets involved makes this approach impractical on traditional computer architectures leading to prohibitively long runtimes. In this paper, we present a new computational grid architecture based on a hybrid computing model to significantly accelerate comparative genomics applications. The hybrid computing model consists of two types of parallelism: coarse grained and fine grained. The coarse-grained parallelism uses a volunteer computing infrastructure for job distribution, while the fine-grained parallelism uses commodity computer graphics hardware for fast sequence alignment. We present the deployment and evaluation of this approach on our grid test bed for the all-against-all comparison of microbial genomes. The results of this comparison are then used by phenotype--genotype explorer (PheGee). PheGee is a new tool that nominates candidate genes responsible for a given phenotype.

[1]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[2]  Bertil Schmidt,et al.  Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW , 2005, Bioinform..

[3]  Dinesh Manocha,et al.  General-Purpose Computations Using Graphics Processors , 2005, Computer.

[4]  Weiguo Liu,et al.  GPU-ClustalW: Using Graphics Hardware to Accelerate Multiple Sequence Alignment , 2006, HiPC.

[5]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[6]  Gilles Fedak,et al.  The Computational and Storage Potential of Volunteer Computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[7]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[8]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael S. Waterman,et al.  Biological information signal processor , 1991, Proceedings of the International Conference on Application Specific Array Processors.

[10]  Charles L. Brooks,et al.  Predictor@Home: A "Protein Structure Prediction Supercomputer' Based on Global Computing , 2006, IEEE Transactions on Parallel and Distributed Systems.

[11]  Stephen G. Tell,et al.  BioSCAN: a network sharable computational resource for searching biosequence databases , 1996, Comput. Appl. Biosci..

[12]  Sergey Steinberg,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 2004, Nucleic Acids Res..

[13]  Rolf Apweiler,et al.  CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins , 2001, Nucleic Acids Res..

[14]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[15]  Dave Bridges,et al.  Cyclic nucleotide binding proteins in the Arabidopsis thaliana and Oryza sativa genomes , 2005, BMC Bioinformatics.

[16]  Martin Vingron,et al.  Large scale hierarchical clustering of protein sequences , 2005, BMC Bioinformatics.

[17]  Bertil Schmidt,et al.  Massively parallel solutions for molecular sequence analysis , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  W. Keller,et al.  tadA, an essential tRNA‐specific adenosine deaminase from Escherichia coli , 2002, The EMBO journal.

[19]  Eric Rice,et al.  The UCSC Kestrel parallel processor , 2005, IEEE Transactions on Parallel and Distributed Systems.

[20]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[21]  Tatsuya Akutsu,et al.  Fast and accurate database homology search using upper bounds of local alignment scores , 2005, Bioinform..

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  Tatsuya Akutsu,et al.  Clustering of database sequences for fast homology search using upper bounds on alignment score. , 2004, Genome informatics. International Conference on Genome Informatics.

[24]  R. Fleischmann,et al.  Comparative genomics and understanding of microbial biology. , 2000, Emerging infectious diseases.