Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage.

The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.

[1]  de Ng Dick Bruijn,et al.  Circuits and Trees in Oriented Linear Graphs , 1951 .

[2]  Ernest,et al.  Enzymatic synthesis of deoxyribonucleic acid. , 1969, Harvey lectures.

[3]  Donald E. Knuth,et al.  fundamental algorithms , 1969 .

[4]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[5]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[6]  T. Taniguchi,et al.  Structure of a chromosomal gene for human interferon beta. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Lamb,et al.  Complete nucleotide sequence of the neuraminidase gene of influenza B virus. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Contreras,et al.  The human fibroblast and human immune interferon genes and their expression in homologous and heterologous cells. , 1982, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[9]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[10]  W. Fitch Random sequences. , 1983, Journal of molecular biology.

[11]  W. M. Fitch Calculating the expected frequencies of potential secondary structure in nucleic acids as a function of stem length, loop size, base composition and nearest-neighbor frequencies , 1983, Nucleic Acids Res..

[12]  M. Waterman,et al.  Statistical characterization of nucleic acid sequence functional domains. , 1983, Nucleic acids research.

[13]  Michael S. Waterman,et al.  General methods of sequence comparison , 1984 .

[14]  P. Sellers Pattern recognition in genetic sequences by mismatch density , 1984 .

[15]  Arif Zaman Urn Models for Markov Exchangeability , 1984 .

[16]  Internal duplication in human alpha 1 and beta 1 interferons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W. John Wilbur,et al.  On the statistical significance of nucleic acid similarities , 1984, Nucleic Acids Res..

[18]  Complete nucleotide sequences of three VH genes in Caiman, a phylogenetically ancient reptile: evolutionary diversification in coding segments and variation in the structure and organization of recombination elements. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[19]  B. W. Erickson,et al.  Evolution of Proenkephalin and Prodynorphin , 1986 .