Maximum-likelihood approach for gene family evolution under functional divergence.

According to the observed alignment pattern (i.e., amino acid configuration), we studied two basic types of functional divergence of a protein family. Type I functional divergence after gene duplication results in altered functional constraints (i.e., different evolutionary rate) between duplicate genes, whereas type II results in no altered functional constraints but radical change in amino acid property between them (e.g., charge, hydrophobicity, etc.). Two statistical approaches, i.e., the subtree likelihood and the whole-tree likelihood, were developed for estimating the coefficients of (type I or type II) functional divergence. Numerical algorithms for obtaining maximum-likelihood estimates are also provided. Moreover, a posterior-based site-specific profile is implemented to predict critical amino acid residues that are responsible for type I and/or type II functional divergence after gene duplication. We compared the current likelihood with a fast method developed previously by examples; both show similar results. For handling altered functional constraints (type I functional divergence) in the large gene family with many member genes (clusters), which appears to be a normal case in postgenomics, the subtree likelihood provides a solution that is computationally feasible and robust against the uncertainty of the phylogeny. The cost of this feasibility is the approximation when frequencies of amino acids are very skewed. The potential bias and correction are discussed.

[1]  William H. Press,et al.  Numerical recipes in C , 2002 .

[2]  R. DuBois,et al.  The role of cyclooxygenases in inflammation, cancer, and development , 1999, Oncogene.

[3]  J. Wallace,et al.  Distribution and expression of cyclooxygenase (COX) isoenzymes, their physiological roles, and the categorization of nonsteroidal anti-inflammatory drugs (NSAIDs). , 1999, The American journal of medicine.

[4]  X. Gu,et al.  Statistical methods for testing functional divergence after gene duplication. , 1999, Molecular biology and evolution.

[5]  D Fischer,et al.  Analysis of heregulin symmetry by weighted evolutionary tracing. , 1999, Protein engineering.

[6]  N. Neumann,et al.  Fish macrophages express a cyclo-oxygenase-2 homologue after activation. , 1999, The Biochemical journal.

[7]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[8]  A. Dean,et al.  The structural basis of molecular adaptation. , 1998, Molecular biology and evolution.

[9]  M. Nei,et al.  Positive Darwinian selection after gene duplication in primate ribonuclease genes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[11]  J. Zhang,et al.  A simple method for estimating the parameter of substitution rate variation among sites. , 1997, Molecular biology and evolution.

[12]  L. Hood,et al.  Gene families: the taxonomy of protein paralogs and chimeras. , 1997, Science.

[13]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[14]  J. Spring,et al.  Vertebrate evolution by interspecific hybridisation – are we polyploid? , 1997, FEBS letters.

[15]  W. Bruno Modeling residue usage in aligned protein sequences via maximum likelihood. , 1996, Molecular biology and evolution.

[16]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[17]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[18]  G J Barton,et al.  Identification of functional residues and secondary structure from protein multiple sequence alignment. , 1996, Methods in enzymology.

[19]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[20]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[21]  A. Sidow,et al.  Gene duplications and the origins of vertebrate development. , 1994, Development (Cambridge, England). Supplement.

[22]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[23]  L. Lundin,et al.  Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. , 1993, Genomics.

[24]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[25]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[26]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[27]  M. Kimura,et al.  The neutral theory of molecular evolution. , 1983, Scientific American.

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[30]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.