gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution

One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.

[1]  Arjen K. Lenstra,et al.  A heterogeneous computing environment to solve the 768-bit RSA challenge , 2010, Cluster Computing.

[2]  Alexandros Stamatakis,et al.  Large-Scale Co-Phylogenetic Analysis on the Grid , 2009, Int. J. Grid High Perform. Comput..

[3]  Sébastien Moretti,et al.  Selectome: a database of positive selection , 2008, Nucleic Acids Res..

[4]  Sébastien Moretti,et al.  Phylogenetic Code in the Cloud - Can it Meet the Expectations? , 2010, HealthGrid.

[5]  Johan Montagnat,et al.  Grid-enabled Virtual Screening Against Malaria , 2006, Journal of Grid Computing.

[6]  Johannes Elmsheuser,et al.  Ganga: A tool for computational-task management and easy access to Grid resources , 2009, Comput. Phys. Commun..

[7]  L. Hurst Genetics and the understanding of selection , 2009, Nature Reviews Genetics.

[8]  A. Bennett The Origin of Species by means of Natural Selection; or the Preservation of Favoured Races in the Struggle for Life , 1872, Nature.

[9]  Shantenu Jha,et al.  Grid Interoperability at the Application Level Using SAGA , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[10]  Sergio Maffioletti,et al.  Computational workflows with GC3Pie , 2012 .

[11]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[12]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[13]  J. Lindemann,et al.  Advanced Resource Connector middleware for lightweight computational Grids , 2007, Future Gener. Comput. Syst..

[14]  Heinz Stockinger,et al.  Grid Approach to Embarrassingly Parallel CPU-Intensive Bioinformatics Problems , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[15]  Adam Eyre-Walker,et al.  The genomic rate of adaptive evolution. , 2006, Trends in ecology & evolution.

[16]  D. Liberles,et al.  The quest for natural selection in the age of comparative genomics , 2007, Heredity.