ProtTest-HPC: Fast Selection of Best-Fit Models of Protein Evolution

The use of probabilistic models of amino acid replacement is essential for the study of protein evolution, and programs like ProtTest implement different strategies to identify the best-fit model for the data at hand. For large protein alignments, this task can demand vast computational resources, preventing the justification of the model used in the analysis. We have implemented a High Performance Computing (HPC) version of ProtTest. ProtTest-HPC can be executed in parallel in HPC environments as: (1) a GUI-based desktop version that uses multi-core processors and (2) a cluster-based version that distributes the computational load among nodes. The use of ProtTest-HPC resulted in significant performance gains, with speedups of up to 50 on a high performance cluster.

[1]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[2]  Thomas M. Keane,et al.  MultiPhyl: a high-throughput phylogenomics webserver using distributed computing , 2007, Nucleic Acids Res..

[3]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[4]  Thomas J Naughton,et al.  Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified , 2006, BMC Evolutionary Biology.

[5]  Jack Sullivan,et al.  Model Selection in Phylogenetics , 2005 .

[6]  Juan Touriño,et al.  Java for high performance computing: assessment of current research and practice , 2009, PPPJ '09.

[7]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[8]  Juan Touriño,et al.  F-MPJ: scalable Java message-passing communications on parallel systems , 2012, The Journal of Supercomputing.

[9]  Michael Nilges,et al.  ISD: a software package for Bayesian NMR structure calculation , 2008, Bioinform..

[10]  Mark Baker,et al.  Nested parallelism for multi-core HPC systems using Java , 2009, J. Parallel Distributed Comput..

[11]  Dirk Husmeier,et al.  TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops , 2008, Bioinform..

[12]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[13]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[14]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[15]  David Posada,et al.  MtArt: a new model of amino acid replacement for Arthropoda. , 2006, Molecular biology and evolution.

[16]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[17]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[18]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[19]  K. Strimmer,et al.  TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics , 2004, BMC Evolutionary Biology.