MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.

[1]  Sudhir Kumar,et al.  Discovering the Timetree of Life , 2009 .

[2]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[3]  T. Pupko,et al.  Site-Specific Evolutionary Rate Inference: Taking Phylogenetic Uncertainty into Account , 2005, Journal of Molecular Evolution.

[4]  F. Tajima,et al.  Simple methods for testing the molecular evolutionary clock hypothesis. , 1993, Genetics.

[5]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[6]  W. Fitch An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. , 1986, Progress in clinical and biological research.

[7]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[8]  Sudhir Kumar,et al.  Incomplete taxon sampling is not a problem for phylogenetic inference , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Walter M. Fitch,et al.  A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case , 1967, Biochemical Genetics.

[10]  Jack Sullivan,et al.  Does choice in model selection affect maximum likelihood analysis? , 2008, Systematic biology.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  C. G. Schrago An empirical examination of the standard errors of maximum likelihood phylogenetic parameters under the molecular clock via bootstrapping. , 2006, Genetics and molecular research : GMR.

[13]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[14]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[15]  A. W. F. Edwards,et al.  Statistical Inference. (Book Reviews: Likelihood. An Account of the Statistical Concept of Likelihood and Its Application to Scientific Inference) , 1973 .

[16]  Joel Dudley,et al.  MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences , 2008, Briefings Bioinform..

[17]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Michael E Alfaro,et al.  Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. , 2006, Systematic biology.

[19]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[20]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[21]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[22]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[23]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[24]  Sudhir Kumar,et al.  MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers , 1994, Comput. Appl. Biosci..

[25]  Fitch Wm An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. , 1986 .

[26]  Sudhir Kumar,et al.  Fast and slow implementations of relaxed-clock methods show similar patterns of accuracy in estimating divergence times. , 2011, Molecular biology and evolution.

[27]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[28]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[29]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[30]  Jack Sullivan,et al.  Assessment of substitution model adequacy using frequentist and Bayesian methods. , 2010, Molecular biology and evolution.

[31]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[32]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[33]  Sudhir Kumar,et al.  Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. , 2003, Molecular biology and evolution.

[34]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[35]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[36]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[37]  A Rzhetsky,et al.  Phylogenetic test of the molecular clock and linearized trees. , 1995, Molecular biology and evolution.

[38]  E. Cohen,et al.  Estimation of the number of nucleotide sequences in mouse DNA complementary to messenger RNAs specifying a complete mouse immunoglobulin. , 1976, Biochemistry.

[39]  Sudhir Kumar,et al.  The timetree of life , 2009 .

[40]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[41]  Warren J. Ewens,et al.  Likelihood: An account of the statistical concept of likelihood and its application to scientific inference. , 1973 .

[42]  W. Fitch,et al.  Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. , 1989, Molecular biology and evolution.

[43]  Ziheng Yang,et al.  Computational Molecular Evolution , 2006 .

[44]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[45]  Joel Dudley,et al.  Bioinformatics software for biologists in the genomics era , 2007, Bioinform..