Automated de novo identification of repeat sequence families in sequenced genomes.

Repetitive sequences make up a major part of eukaryotic genomes. We have developed an approach for the de novo identification and classification of repeat sequence families that is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. When tested on the human genome, our approach was able to properly identify and group known transposable elements. The program, should be useful for first-pass automatic classification of repeats in newly sequenced genomes.

[1]  W. Doolittle,et al.  Selfish genes, the phenotype paradigm and genome evolution , 1980, Nature.

[2]  Steven Skiena,et al.  The Algorithm Design Manual , 2020, Texts in Computer Science.

[3]  F. Crick,et al.  Selfish DNA: the ultimate parasite , 1980, Nature.

[4]  B. Mcclintock,et al.  The significance of responses of the genome to challenge. , 1984, Science.

[5]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[6]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[7]  Pankaj Agarwal,et al.  The Repeat Pattern Toolkit (RPT): Analyzing the Structure and Evolution of the C. elegans Genome , 1994, ISMB.

[8]  J. D. Parsons,et al.  Miropeats: graphical DNA sequence comparisons , 1995, Comput. Appl. Biosci..

[9]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[10]  Jérôme Gracy,et al.  Automated protein sequence database classification. II. Delineation Of domain boundaries from sequence similarities , 1998, Bioinform..

[11]  D. Voytas,et al.  Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. , 1998, Genome research.

[12]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[13]  M. C. Butler,et al.  Human Transaldolase-associated Repetitive Elements Are Transcribed by RNA Polymerase III* , 2000, Journal of Biological Chemistry.

[14]  Enno Ohlebusch,et al.  Computation and Visualization of Degenerate Repeats in Complete Genomes , 2000, ISMB.

[15]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[16]  Eliot Marshall Genome Teams Adjust to Shotgun Marriage , 2001, Science.