The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment

The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., fk = Ck−b, where fk is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is wk = a + b ln k, where a is the gap creation cost and blnk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (ai) should be larger than that of deletion (ad); that is, ai − ad = In R, where R is the frequency ratio of deletions to insertions.

[1]  M T Clegg,et al.  Evolution of a noncoding region of the chloroplast genome. , 1993, Molecular phylogenetics and evolution.

[2]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[3]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[4]  D. Cooper,et al.  Gene deletions causing human genetic disease: mechanisms of mutagenesis and the role of the local DNA sequence environment , 1991, Human Genetics.

[5]  M. A. McClure,et al.  Comparative analysis of multiple protein-sequence alignment methods. , 1994, Molecular biology and evolution.

[6]  M. Murata,et al.  Three-way Needleman--Wunsch algorithm. , 1990, Methods in enzymology.

[7]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[8]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[9]  G J Barton,et al.  Evaluation and improvements in the automatic alignment of protein sequences. , 1987, Protein engineering.

[10]  T. Kunkel Misalignment-mediated DNA synthesis errors. , 1990, Biochemistry.

[11]  N. Saitou,et al.  Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates. , 1994, Molecular biology and evolution.

[12]  Rainer Fuchs,et al.  CLUSTAL V: improved software for multiple sequence alignment , 1992, Comput. Appl. Biosci..

[13]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[14]  Wilfried W. de Jong,et al.  Causes of more frequent deletions than insertions in mutations and protein evolution , 1981, Nature.

[15]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[16]  Dan Graur,et al.  Deletions in processed pseudogenes accumulate faster in rodents than in humans , 1989, Journal of Molecular Evolution.

[17]  E. Vanin,et al.  Processed pseudogenes: characteristics and evolution. , 1985, Annual review of genetics.

[18]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[19]  H Kishino,et al.  Freeing phylogenies from artifacts of alignment. , 1992, Molecular biology and evolution.