Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz.

[1]  Pavel Neumann,et al.  Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula , 2007, BMC Genomics.

[2]  Steven A. Williams,et al.  A Case for Using Genomics and a Bioinformatics Pipeline to Develop Sensitive and Species-Specific PCR-Based Diagnostics for Soil-Transmitted Helminths , 2019, Front. Genet..

[3]  J. Macas,et al.  Employing next generation sequencing to explore the repeat landscape of the plant genome , 2015 .

[4]  J. Bennetzen,et al.  The contributions of transposable elements to the structure, function, and evolution of plant genomes. , 2014, Annual review of plant biology.

[5]  L. Altschmied,et al.  Plantago lagopus B Chromosome Is Enriched in 5S rDNA-Derived Satellite DNA , 2016, Cytogenetic and Genome Research.

[6]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[7]  J. Macas,et al.  In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae , 2015, PloS one.

[8]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[9]  M. Israel,et al.  A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. , 1979, The Journal of biological chemistry.

[10]  J. Macas,et al.  Extraordinary Sequence Diversity and Promiscuity of Centromeric Satellites in the Legume Tribe Fabeae , 2020, Molecular biology and evolution.

[11]  G. Bourque,et al.  Computational tools to unmask transposable elements , 2018, Nature Reviews Genetics.

[12]  Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification , 2019, Mobile DNA.

[13]  J. Macas,et al.  TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads , 2017, Nucleic acids research.

[14]  J. Macas,et al.  Stretching the Rules: Monocentric Chromosomes with Multiple Centromere Domains , 2012, PLoS genetics.

[15]  P. Dear,et al.  Comparative Genome Analysis Reveals Divergent Genome Size Evolution in a Carnivorous Plant Genus , 2015, The plant genome.

[16]  M. Garrido-Ramos Satellite DNA: An Evolving Topic , 2017, Genes.

[17]  A. Clark,et al.  Satellite DNA evolution: old ideas, new approaches. , 2018, Current opinion in genetics & development.

[18]  I. Leitch,et al.  Genome Size Diversity and Its Impact on the Evolution of Land Plants , 2018, Genes.

[19]  Matthias Zytnicki,et al.  Tedna: a transposable element de novo assembler , 2014, Bioinform..

[20]  Laurent Modolo,et al.  De Novo Assembly and Annotation of the Asian Tiger Mosquito (Aedes albopictus) Repeatome with dnaPipeTE from Raw Genomic Reads and Comparative Analysis with the Yellow Fever Mosquito (Aedes aegypti) , 2015, Genome biology and evolution.

[21]  Petr Novák,et al.  RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads , 2013, Bioinform..

[22]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[23]  J. Macas,et al.  Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing , 2018, Scientific Reports.

[24]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[25]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[26]  Matthias Platzer,et al.  RepARK—de novo creation of repeat libraries from whole-genome NGS reads , 2014, Nucleic acids research.

[27]  Yufeng Wu,et al.  REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads , 2016, PloS one.

[28]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.