MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment

With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[3]  F. Collins,et al.  The Human Genome Project: Lessons from Large-Scale Biology , 2003, Science.

[4]  M. Nei,et al.  Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. , 1997, Molecular biology and evolution.

[5]  D. Maddison,et al.  MacClade 4: analysis of phy-logeny and character evolution , 2003 .

[6]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[7]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[8]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[9]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[10]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[11]  Y. Ina,et al.  New methods for estimating the numbers of synonymous and nonsynonymous substitutions , 1995, Journal of Molecular Evolution.

[12]  C. Luo,et al.  A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. , 1985, Molecular biology and evolution.

[13]  M. Nei,et al.  Estimation of evolutionary distance between nucleotide sequences. , 1984, Molecular biology and evolution.

[14]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[15]  P. Waddell,et al.  Rapid Evaluation of Least-Squares and Minimum-Evolution Criteria on Phylogenetic Trees , 1998 .

[16]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[17]  Sudhir Kumar,et al.  MEGA2: molecular evolutionary genetics analysis software , 2001, Bioinform..

[18]  T. Sitnikova,et al.  Bootstrap method of interior-branch test for phylogenetic trees. , 1996, Molecular biology and evolution.

[19]  A Rzhetsky,et al.  Interior-branch and bootstrap tests of phylogenetic trees. , 1995, Molecular biology and evolution.

[20]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[21]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[22]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[23]  J. Zhang,et al.  A simple method for estimating the parameter of substitution rate variation among sites. , 1997, Molecular biology and evolution.

[24]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[25]  Z. Yang,et al.  Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. , 1996, Molecular biology and evolution.

[26]  Josep M. Comeron,et al.  A method for estimating the numbers of synonymous and nonsynonymous substitutions per site , 1995, Journal of Molecular Evolution.

[27]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[28]  Sudhir Kumar,et al.  Evolutionary distance estimation under heterogeneous substitution pattern among lineages. , 2002, Molecular biology and evolution.

[29]  Wen-Hsiung Li Unbiased estimation of the rates of synonymous and nonsynonymous substitution , 2006, Journal of Molecular Evolution.

[30]  Sudhir Kumar,et al.  Efficiency of the Neighbor-Joining Method in Reconstructing Deep and Shallow Evolutionary Relationships in Large Phylogenies , 2000, Journal of Molecular Evolution.

[31]  M. Nei,et al.  Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. , 2000, Molecular biology and evolution.

[32]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[33]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[34]  Paul M. Sharp,et al.  Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes , 1986, Nucleic Acids Res..

[35]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[36]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[37]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[38]  Joaquin Dopazo,et al.  Estimating errors and confidence intervals for branch lengths in phylogenetic trees by a bootstrap approach , 1994, Journal of Molecular Evolution.

[39]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[40]  F. Tajima,et al.  Simple methods for testing the molecular evolutionary clock hypothesis. , 1993, Genetics.

[41]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[42]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[43]  S. Jeffery Evolution of Protein Molecules , 1979 .

[44]  Paul Walton Purdom,et al.  Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees , 2000, Bioinform..

[45]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[46]  A Rzhetsky,et al.  Phylogenetic test of the molecular clock and linearized trees. , 1995, Molecular biology and evolution.

[47]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[48]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[49]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[50]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[51]  Wen-Hsiung Li,et al.  Higher rates of amino acid substitution in rodents than in humans. , 1992, Molecular phylogenetics and evolution.

[52]  D. Maddison,et al.  NEXUS: an extensible file format for systematic information. , 1997, Systematic biology.

[53]  M. Nei,et al.  Positive Darwinian selection after gene duplication in primate ribonuclease genes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[54]  B S Weir,et al.  Testing for equality of evolutionary rates. , 1992, Genetics.

[55]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[56]  Koichiro Tarnura,et al.  The Rate and Pattern of Nucleotide Substitution in Drosophila Mitochondrial DNA , 1992 .

[57]  S Kumar,et al.  Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. , 2001, Genetics.

[58]  N. Bianchi,et al.  Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. , 1993, Molecular biology and evolution.

[59]  M. Nei,et al.  Theoretical foundation of the minimum-evolution method of phylogenetic inference. , 1993, Molecular biology and evolution.

[60]  P. Meisel Margaret O. Dayhoff: Atlas of Protein Sequence and Structure 1969 (Volume 4) XXIV u. 361 S., 21 Ausklapptafeln, 68 Abb. und zahlreiche Tabellen. National Biomedical Research Foundation, Silver Spring/Maryland 1969. Preis $ 12,50 , 1971 .

[61]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[62]  Sudhir Kumar,et al.  MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers , 1994, Comput. Appl. Biosci..

[63]  Enno Ohlebusch,et al.  An Applications-focused Review of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges , 2003, Briefings Bioinform..

[64]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  N. Goodman Biological data becomes computer literate: new advances in bioinformatics. , 2002, Current opinion in biotechnology.