Identification and annotation of repetitive sequences in fungal genomes.

Advances in sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data into meaningful information. In addition to protein coding regions, an integral part of all the genomes studied so far has been the presence of repetitive sequences. Previously considered as "junk," numerous studies have implicated repetitive sequences in important biological and structural roles in the genome. Therefore, the identification and characterization of these repetitive sequences has become an indispensable part of genome sequencing projects. Numerous similarity-based and de novo methods have been developed to search for and annotate repeats in the genome, many of which have been discussed in this chapter.

[1]  C. A. Thomas The genetic organization of chromosomes. , 1971, Annual review of genetics.

[2]  Ronald W. Davis,et al.  Evidence for transposition of dispersed repetitive DNA families in yeast , 1979, Cell.

[3]  F. Crick,et al.  Selfish DNA: the ultimate parasite , 1980, Nature.

[4]  J. Sambrook,et al.  Molecular Cloning: A Laboratory Manual , 2001 .

[5]  E. Selker,et al.  Rearrangement of duplicated DNA in specialized cells of Neurospora , 1987, Cell.

[6]  J. Kinsey,et al.  Isolation of a transposable element from Neurospora crassa. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  W. Engels,et al.  High-frequency P element loss in Drosophila is homolog dependent , 1990, Cell.

[9]  L. Jin,et al.  Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. , 1992, Genomics.

[10]  R. Richards,et al.  Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. , 1993, Human molecular genetics.

[11]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[12]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[13]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[14]  R. Wells Molecular Basis of Genetic Instability of Triplet Repeats (*) , 1996, The Journal of Biological Chemistry.

[15]  J. Lupski Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. , 1998, Trends in genetics : TIG.

[16]  Maxime Crochemore,et al.  Factor Oracle: A New Structure for Pattern Matching , 1999, SOFSEM.

[17]  W. Pearson,et al.  Panning for genes--A visual strategy for identifying novel gene orthologs and paralogs. , 1999, Genome research.

[18]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[19]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[20]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[21]  S. Dongen Graph clustering by flow simulation , 2000 .

[22]  Ian Korf,et al.  MaskerAid : a performance enhancement to RepeatMasker , 2000, Bioinform..

[23]  J. Ott,et al.  GT repeats are associated with recombination on human chromosome 22. , 2000, Genome research.

[24]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[25]  J. Jurka,et al.  Rolling-circle transposons in eukaryotes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[27]  E. Eichler,et al.  Recent duplication, domain accretion and the dynamic mutation of the human genome. , 2001, Trends in genetics : TIG.

[28]  S. Wessler,et al.  Treasures in the attic: Rolling circle transposons discovered in eukaryotic genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Petrov,et al.  Gene galaxies in the maize genome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  B. Haas,et al.  A clustering method for repeat analysis in DNA sequences , 2001, Genome Biology.

[31]  Z. Gu,et al.  Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. , 2002, Molecular biology and evolution.

[32]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[33]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[34]  E. Koonin,et al.  The role of lineage-specific gene family expansion in the evolution of eukaryotes. , 2002, Genome research.

[35]  C. Weil,et al.  The hAT and CACTA Superfamilies of Plant Transposons , 2002 .

[36]  S. Firestein,et al.  The olfactory receptor gene superfamily of the mouse , 2002, Nature Neuroscience.

[37]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[38]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[39]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[40]  Guojun Yang,et al.  MAK, a computational tool kit for automated MITE analysis , 2003, Nucleic Acids Res..

[41]  Arnaud Lefebvre,et al.  FORRepeats: detects repeats on entire chromosomes and between genomes , 2003, Bioinform..

[42]  E. Mauceli,et al.  The genome sequence of the filamentous fungus Neurospora crassa , 2003, Nature.

[43]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[44]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[45]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[46]  M. Lynch,et al.  The Origins of Genome Complexity , 2003, Science.

[47]  H. Quesneville,et al.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes , 2003, Journal of Molecular Evolution.

[48]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[49]  B. Dujon,et al.  Eucaryotic genome evolution through the spontaneous duplication of large chromosomal segments , 2004, The EMBO journal.

[50]  Arun Krishnan,et al.  Exhaustive whole-genome tandem repeats search , 2004, Bioinform..

[51]  V. Pereira Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome , 2004, Genome Biology.

[52]  Hadi Quesneville,et al.  Detection of transposable elements by their compositional bias , 2004, BMC Bioinformatics.

[53]  S. Wessler,et al.  Using rice to understand the origin and amplification of miniature inverted repeat transposable elements (MITEs). , 2004, Current opinion in plant biology.

[54]  J. Souciet,et al.  Recovery of a function involving gene duplication by retroposition in Saccharomyces cerevisiae. , 2004, Genome research.

[55]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[56]  E. Eichler,et al.  A genome-wide comparison of recent chimpanzee and human segmental duplications , 2005, Nature.

[57]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[58]  Jian Wang,et al.  ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun , 2005, PLoS Comput. Biol..

[59]  Giorgio Valle,et al.  BIOINFORMATICS ORIGINAL PAPER Sequence analysis RAP: a new computer program for de novo identification of repeated sequences in whole genomes , 2004 .

[60]  Eugene W. Myers,et al.  PILER: identification and classification of genomic repeats , 2005, ISMB.

[61]  Guy Perrière,et al.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases , 2005, Bioinform..

[62]  Srinivas Aluru,et al.  Efficient algorithms and software for detection of full-length LTR retrotransposons , 2006, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[63]  Elizabeth E Thomas Short, local duplications in eukaryotic genomes. , 2005, Current opinion in genetics & development.

[64]  Casey M. Bergman,et al.  Combined Evidence Annotation of Transposable Elements in Genome Sequences , 2005, PLoS Comput. Biol..

[65]  M. Morgante,et al.  Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize , 2005, Nature Genetics.

[66]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[67]  Alan M. Durham,et al.  TRAP: automated classification, quantification and annotation of tandemly repeated sequences , 2006, Bioinform..

[68]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[69]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[70]  Joshua S. Yuan,et al.  Statistical tools for transgene copy number estimation based on real-time PCR , 2007, BMC Bioinformatics.

[71]  Gary Benson,et al.  TRDB—The Tandem Repeats Database , 2006, Nucleic Acids Res..

[72]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[73]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[74]  M. Lynch,et al.  De novo identification of LTR retrotransposons in eukaryotic genomes , 2007, BMC Genomics.

[75]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[76]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[77]  H. Wolinsky The thousand‐dollar genome , 2007, EMBO reports.

[78]  Jason S. Caronna,et al.  Computational prediction and molecular confirmation of Helitron transposons in the maize genome , 2008, BMC Genomics.

[79]  M. Bilgen,et al.  Exact tandem repeats analyzer (E-TRA): A new program for DNA sequence mining , 2005, Journal of Genetics.

[80]  S. Kurtz,et al.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.

[81]  Hugo Y. K. Lam,et al.  Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. , 2008, Genome research.

[82]  E. Eichler,et al.  DupMasker: a tool for annotating primate segmental duplications. , 2008, Genome research.

[83]  Philip M. Kim,et al.  The current excitement about copy-number variation: how it relates to gene duplications and protein families. , 2008, Current opinion in structural biology.

[84]  J. Bennetzen,et al.  Structure-based discovery and description of plant and animal Helitrons , 2009, Proceedings of the National Academy of Sciences.

[85]  J. Bennetzen,et al.  The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes , 2009, Plant Methods.

[86]  F. Zhou,et al.  MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. , 2009, Gene.

[87]  D. Ahrén,et al.  Expansion of signal pathways in the ectomycorrhizal fungus Laccaria bicolor- evolution of nucleotide sequences and expression patterns in families of protein kinases and RAS small GTPases. , 2009, The New phytologist.