Estimates of DNA and protein sequence divergence: an examination of some assumptions.

Some of the assumptions underlying estimates of DNA and protein sequence divergence are examined. A solution for the variance of these estimates that allows for different mutation rates and different population sizes in each species and for an arbitrary structure in the initial population is obtained. It is shown that these conditions do not strongly affect estimates of divergence. In general, they cause the variance of divergence to be smaller than a binomial variance. Thus, the binomial variance that is usually assumed for these estimates is safely conservative. It is shown that variability in the mutation rate among sites can have an effect as large as or larger than variability in the mutation rate among bases. Variability in the mutation rate among bases and among sites causes the number of substitutions between two sequences to be underestimated. Protein and DNA sequences from several species are collected to estimate the variability in mutation rates among sites. When many homologous sequences are known, standard methods to estimate this variability can be used. The estimates of this variability show that this factor is important when considering the spectrum of spontaneous mutations and is strongly reflected in the divergence of sequences. Smaller variability is found for the third position of codons than for the first and second codon positions. This may be because of less selective constraints on this position or because the third position has been saturated with mutations for the sequences examined.

[1]  A. Ullrich,et al.  Molecular cloning and sequence analysis of adult chicken β globin cDNA , 1979 .

[2]  Philip J. Farabaugh,et al.  Molecular basis of base substitution hotspots in Escherichia coli , 1978, Nature.

[3]  B. Weir,et al.  Behavior of pairs of loci in finite monoecious populations. , 1974, Theoretical population biology.

[4]  A. Templeton,et al.  Speciation and inferences on rates of molecular evolution from genetic distances , 1981, Heredity.

[5]  W. Fitch,et al.  Protein evolution and the molecular clock. , 1976, Federation proceedings.

[6]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[7]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[8]  S. Weissman,et al.  Complete nucleotide sequence of the human δ-globin gene , 1980, Cell.

[9]  D. Serant Linkage and inbreeding coefficients in a finite random mating population. , 1974, Theoretical population biology.

[10]  W. B. Upholt Estimation of DNA sequence divergence from comparison of restriction endonuclease digests. , 1977, Nucleic acids research.

[11]  C. Aquadro,et al.  Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. , 1983, Genetics.

[12]  C. Shoulders,et al.  The primary structure of the human ϵ-globin gene , 1980, Cell.

[13]  Theoretical foundations for quantitative paleogenetics , 1980, Journal of molecular evolution.

[14]  K. Risko,et al.  A method for estimating rates of nucleotide substitution using DNA sequence data. , 1982, Theoretical population biology.

[15]  T. Maniatis,et al.  The nucleotide sequence of the human β-globin gene , 1980, Cell.

[16]  C. I. Bliss,et al.  FITTING THE NEGATIVE BINOMIAL DISTRIBUTION TO BIOLOGICAL DATA AND NOTE ON THE EFFICIENT FITTING OF THE NEGATIVE BINOMIAL , 1953 .

[17]  N. Rosenthal,et al.  The structure and transcription of four linked rabbit β-like globin genes , 1979, Cell.

[18]  E. Schon,et al.  Structure and evolution of goat γ-, β C- and β A-globin genes: Three developmentally regulated genes contain inserted elements , 1981, Cell.

[19]  A. Jacquard The Genetic Structure of Populations , 1974 .

[20]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[21]  J. Drake The Molecular Basis of Mutation , 2019 .

[22]  M. Kimura Estimation of evolutionary distances between homologous nucleotide sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[23]  William H. Beyer,et al.  Handbook of Tables for Probability and Statistics , 1967 .

[24]  J. Cairns,et al.  Random components in mutagenesis , 1982, Nature.

[25]  J. Maizel,et al.  The evolution and sequence comparison of two recently diverged mouse chromosomal β-globin genes , 1979, Cell.

[26]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[27]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[28]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[29]  M. D. Topal,et al.  Complementary base pairing and the origin of substitution mutations , 1976, Nature.

[30]  G. B. Golding,et al.  The distribution of nucleotide site differences between two finite sequences. , 1982, Theoretical population biology.

[31]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[32]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[33]  A. Templeton MODES OF SPECIATION AND INFERENCES BASED ON GENETIC DISTANCES , 1980, Evolution; international journal of organic evolution.

[34]  N. Takahata Linkage disequilibrium, genetic distance and evolutionary distance under a general model of linked genes or a part of the genome , 1982 .

[35]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[36]  S. Benzer,et al.  ON THE TOPOGRAPHY OF THE GENETIC FINE STRUCTURE. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[37]  M. Nei,et al.  Mathematical model for studying genetic variation in terms of restriction endonucleases. , 1979, Proceedings of the National Academy of Sciences of the United States of America.

[38]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[39]  M. Nei,et al.  Drift variances of heterozygosity and genetic distance in transient states. , 1975, Genetical research.