PheGeeatHome: A Grid-Based Tool for Comparative Genomics

In this chapter we present PheGee@Home, a grid-based comparative genomics tool that nominates candidate genes responsible for a given phenotype. A phenotype is the physical manifestation of the interplay of genetic, epigenetic and environmental factors. Our tool is designed to facilitate the discovery and prioritization of candidate genes controlling or contributing to the genetically determined portion of a specified phenotype. However, in order to make reliable nominations of candidate genes from sequence data, several genome-size sequence datasets are required. This makes the approach impractical on traditional computer architectures leading to prohibitively long runtimes. Therefore, we use a computational architecture based on a desktop grid environment and commodity graphics hardware to significantly accelerate PheGee. We validate this approach by showing the deployment and evaluation on a grid testbed for the comparison of microbial genomes.

[1]  Makoto Yoshida,et al.  High Performance Computing Design by Code Migration for Distributed Desktop Computing Grids , 2011, Int. J. Grid High Perform. Comput..

[2]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[3]  Weiguo Liu,et al.  Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA , 2008, Comput. Phys. Commun..

[4]  Courtenay T. Vaughan,et al.  The Red Storm Architecture and Early Experiences with Multi-Core Processors , 2010, Int. J. Distributed Syst. Technol..

[5]  Álvaro Hernández,et al.  Developing Biomedical Applications in the Framework of EELA , 2009 .

[6]  R. Fleischmann,et al.  Comparative genomics and understanding of microbial biology. , 2000, Emerging infectious diseases.

[7]  Tatsuya Akutsu,et al.  Clustering of database sequences for fast homology search using upper bounds on alignment score. , 2004, Genome informatics. International Conference on Genome Informatics.

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Joseph E. Urban,et al.  The Effect of Abstract Data Types on Program Development , 1987, Computer.

[10]  Tatsuya Akutsu,et al.  Fast and accurate database homology search using upper bounds of local alignment scores , 2005, Bioinform..

[11]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[12]  International Journal of Distributed Systems and Technologies , .

[13]  Weiguo Liu,et al.  GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences , 2008, PRIB.

[14]  Nong Xiao,et al.  A Push-Based Prefetching for Remote Caching RAM Grid , 2009, Int. J. Grid High Perform. Comput..

[15]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[16]  Mathias Sprinzl,et al.  Compilation of tRNA sequences and sequences of tRNA genes , 1993, Nucleic Acids Res..

[17]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[18]  Ekaterina Kldiashvili Grid Technologies for E-Health: Applications for Telemedicine Services and Delivery , 2010 .

[19]  W. Keller,et al.  tadA, an essential tRNA‐specific adenosine deaminase from Escherichia coli , 2002, The EMBO journal.

[20]  Ankur Gupta,et al.  Toward a Quality-of-Service Framework for Peer-to-Peer Applications , 2010, Int. J. Distributed Syst. Technol..

[21]  Ribhi Hazin,et al.  Computational Grids: An Introduction to Potential Biomedical Uses and Future Prospects in oncology; Neuro-Oncology applications as a model for cancer sub-specialties , 2011 .

[22]  Charles L. Brooks,et al.  Predictor@Home: A "Protein Structure Prediction Supercomputer' Based on Global Computing , 2006, IEEE Transactions on Parallel and Distributed Systems.

[23]  Barbara Rita Barricelli,et al.  A Meta-Design Model for Creative Distributed Collaborative Design , 2011, Int. J. Distributed Syst. Technol..

[24]  Dinesh Manocha,et al.  General-Purpose Computations Using Graphics Processors , 2005, Computer.

[25]  Frank Z. Wang,et al.  Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications , 2009 .

[26]  Antonio Liotta,et al.  Handbook of Research on P2P and Grid Systems for Service-oriented Computing: Models, Methodologies a , 2010 .

[27]  M. Wagner,et al.  Microbial diversity and the genetic nature of microbial species , 2008, Nature Reviews Microbiology.

[28]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[29]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[30]  Antonio Puliafito,et al.  Credential Management Enforcement and Secure Data Storage in gLite , 2010, Int. J. Distributed Syst. Technol..

[31]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[32]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[33]  C. Ponting,et al.  The natural history of protein domains. , 2002, Annual review of biophysics and biomolecular structure.

[34]  Loren H. Rieseberg,et al.  Parallel genotypic adaptation: when evolution repeats itself , 2005, Genetica.