MEGA 5 : Molecular Evolutionary Genetics Analysis using Maximum Likelihood , Evolutionary Distance , and Maximum Parsimony Methods

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of MEGA5 (Molecular Evolutionary Genetics Analysis version 5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of Maximum Likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user-interface has now been enhanced to be activity-driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from www.megasoftware.net. 3 The Molecular Evolutionary Genetics Analysis (MEGA) software was developed with the goal of providing a biologist-centric, integrated suite of tools for statistical analyses of DNA and protein sequence data from an evolutionary standpoint. Over the years, it has grown to include tools for sequence alignment, phylogenetic reconstruction and phylogeny visualization, testing an array of evolutionary hypotheses, estimating sequence divergences, web-based acquisition of sequence data, and expert systems to generate natural language descriptions of the analysis methods and data chosen by the user Kumar et al. 2008). With the fifth major release, the collection of analysis tools in MEGA has now broadened to include the Maximum Likelihood (ML) methods for molecular evolutionary analysis. Table 1 contains a summary of all statistical methods and models in MEGA5, with new features marked with an asterisk (*). In the following, we provide a brief description of methodological advancements, along with relevant research results, and technical enhancements in MEGA5. MEGA5 now contains facilities to evaluate the fit of major models of nucleotide and amino acid substitutions, which are frequently desired by researchers (GTR) and five nested models are available, whereas six models with and without empirical frequencies have been programmed for the …

[1]  Sudhir Kumar,et al.  Fast and slow implementations of relaxed-clock methods show similar patterns of accuracy in estimating divergence times. , 2011, Molecular biology and evolution.

[2]  Jack Sullivan,et al.  Assessment of substitution model adequacy using frequentist and Bayesian methods. , 2010, Molecular biology and evolution.

[3]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[4]  Joel Dudley,et al.  MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences , 2008, Briefings Bioinform..

[5]  Jack Sullivan,et al.  Does choice in model selection affect maximum likelihood analysis? , 2008, Systematic biology.

[6]  Joel Dudley,et al.  Bioinformatics software for biologists in the genomics era , 2007, Bioinform..

[7]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[8]  C. G. Schrago An empirical examination of the standard errors of maximum likelihood phylogenetic parameters under the molecular clock via bootstrapping. , 2006, Genetics and molecular research : GMR.

[9]  Michael E Alfaro,et al.  Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. , 2006, Systematic biology.

[10]  T. Pupko,et al.  Site-Specific Evolutionary Rate Inference: Taking Phylogenetic Uncertainty into Account , 2005, Journal of Molecular Evolution.

[11]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[12]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[13]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[14]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[15]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[17]  Sudhir Kumar,et al.  Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. , 2003, Molecular biology and evolution.

[18]  Sudhir Kumar,et al.  Incomplete taxon sampling is not a problem for phylogenetic inference , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[20]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[21]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[22]  A Rzhetsky,et al.  Phylogenetic test of the molecular clock and linearized trees. , 1995, Molecular biology and evolution.

[23]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[24]  Sudhir Kumar,et al.  MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers , 1994, Comput. Appl. Biosci..

[25]  F. Tajima,et al.  Simple methods for testing the molecular evolutionary clock hypothesis. , 1993, Genetics.

[26]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[27]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[28]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[29]  W. Fitch,et al.  Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. , 1989, Molecular biology and evolution.

[30]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[31]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  Warren J. Ewens,et al.  Likelihood: An account of the statistical concept of likelihood and its application to scientific inference. , 1973 .

[34]  Walter M. Fitch,et al.  A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case , 1967, Biochemical Genetics.

[35]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[36]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[37]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[38]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[39]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[40]  W. Fitch An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. , 1986, Progress in clinical and biological research.