A method for identification of highly conserved elements and evolutionary analysis of superphylum Alveolata

BackgroundPerfectly or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. However, little is known about such elements in protists. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them. This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm.ResultsA novel algorithm is developed to identify highly conserved DNA elements. It is based on the identification of dense subgraphs in a specially built multipartite graph (whose parts correspond to genomes). Specifically, the algorithm does not rely on genome alignments, nor pre-identified perfectly conserved elements; instead, it performs a fast search for pairs of words (in different genomes) of maximum length with the difference below the specified edit distance. Such pair defines an edge whose weight equals the maximum (or total) length of words assigned to its ends. The graph composed of these edges is then compacted by merging some of its edges and vertices. The dense subgraphs are identified by a cellular automaton-like algorithm; each subgraph defines a cluster composed of similar inextensible words from different genomes. Almost all clusters are considered as predicted highly conserved elements. The algorithm is applied to the nuclear genomes of the superphylum Alveolata, and the corresponding phylogenetic tree is built and discussed.ConclusionWe proposed an algorithm for the identification of highly conserved elements. The multitude of identified elements was used to infer the phylogeny of Alveolata.

[1]  Lan He,et al.  Characterization and annotation of Babesia orientalis apicoplast genome , 2015, Parasites & Vectors.

[2]  J. Barta,et al.  What is Cryptosporidium? Reappraising its biology and phylogenetic affinities. , 2006, Trends in parasitology.

[3]  A. Kel'manov,et al.  An approximation polynomial-time algorithm for a sequence bi-clustering problem , 2015 .

[4]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[5]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[6]  Jonathan E. Allen,et al.  Genome Sequence of Theileria parva, a Bovine Pathogen That Transforms Lymphocytes , 2005, Science.

[7]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[8]  Olivier Gascuel,et al.  Fast and accurate branch lengths estimation for phylogenomic trees , 2015, BMC Bioinformatics.

[9]  V. Lyubetsky,et al.  Elaboration of the Homologous Plastid-Encoded Protein Families that Separate Paralogs in Magnoliophytes , 2013 .

[10]  J. Keithly,et al.  Cryptosporidium parvum appears to lack a plastid genome. , 2000, Microbiology.

[11]  L. I. Rubanov,et al.  Parallelization of nonuniform loops in supercomputers with distributed memory , 2014 .

[12]  A. Kel'manov,et al.  A randomized algorithm for two-cluster partition of a set of vectors , 2015 .

[13]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[14]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[15]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  M. Yao,et al.  Dynamic distributions of long double-stranded RNA in Tetrahymena during nuclear development and genome rearrangements , 2016, Journal of Cell Science.

[17]  Philipp Bucher,et al.  UCNEbase—a database of ultraconserved non-coding elements and genomic regulatory blocks , 2012, Nucleic Acids Res..

[18]  Y. Inagaki,et al.  Plastid Genome-Based Phylogeny Pinpointed the Origin of the Green-Colored Plastid in the Dinoflagellate Lepidodinium chlorophorum , 2015, Genome biology and evolution.

[19]  Gregory R. Madey,et al.  Multiple organism algorithm for finding ultraconserved elements , 2008 .

[20]  B. Faircloth,et al.  The evolution of peafowl and other taxa with ocelli (eyespots): a phylogenomic approach , 2014, Proceedings of the Royal Society B: Biological Sciences.

[21]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[22]  Nicholas G. Crawford,et al.  LSU Digital Commons LSU Digital Commons Ultraconserved elements are novel phylogenomic markers that Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with resolve placental mammal phylogeny when combined with species-tree analysis species-tr , 2022 .

[23]  Boris Lenhard,et al.  Ancora: a web resource for exploring highly conserved noncoding elements and their association with developmental regulatory genes , 2008, Genome Biology.

[24]  Vassily A. Lyubetsky,et al.  Algorithms for reconstruction of chromosomal structures , 2016, BMC Bioinformatics.

[25]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[26]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[27]  P. Martone,et al.  Evolution of Red Algal Plastid Genomes: Ancient Architectures, Introns, Horizontal Gene Transfer, and Taxonomic Utility of Plastid Markers , 2013, PloS one.

[28]  P. Keeling,et al.  The Complete Plastid Genomes of the Two ‘Dinotoms’ Durinskia baltica and Kryptoperidinium foliaceum , 2010, PloS one.

[29]  C. B. Mamoun,et al.  Sequence and Annotation of the Apicoplast Genome of the Human Pathogen Babesia microti , 2014, PloS one.

[30]  N. D. Levine Perkinsus gen.n. and other new taxa in the protozoan phylum Apicomplexa , 1978 .

[31]  Travis C. Glenn,et al.  A Phylogeny of Birds Based on Over 1,500 Loci Collected by Target Enrichment and High-Throughput Sequencing , 2012, PloS one.

[32]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[33]  A. Kel'manov,et al.  An approximating polynomial algorithm for a sequence partitioning problem , 2014 .

[34]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[35]  Javier Herranz,et al.  Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols , 2011, IEEE Transactions on Knowledge and Data Engineering.

[36]  V. Aleoshin,et al.  Ultrastructure and 28S rDNA phylogeny of two gregarines: Cephaloidophora cf. communis and Heliospora cf. longissima with remarks on gregarine morphology and phylogenetic analysis , 2014 .

[37]  V. Lyubetsky,et al.  A Database of Plastid Protein Families from Red Algae and Apicomplexa and Expression Regulation of the moeB Gene , 2015, BioMed research international.

[38]  V. A. Lyubetsky,et al.  Note on Cliques and Alignments , 2004 .

[39]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[40]  M. Gardner,et al.  The Alveolate Perkinsus marinus: Biological Insights from EST Gene Discovery , 2010, BMC Genomics.

[41]  C. Delwiche,et al.  Dinoflagellate phylogeny revisited: using ribosomal proteins to resolve deep branching dinoflagellate clades. , 2014, Molecular phylogenetics and evolution.

[42]  V. Lyubetsky,et al.  Regulation of Expression and Evolution of Genes in Plastids of Rhodophytic Branch , 2016, Life.

[43]  Seán G. Brady,et al.  Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera , 2014, Molecular ecology resources.

[44]  S. Brenner,et al.  Large number of ultraconserved elements were already present in the jawed vertebrate ancestor. , 2008, Molecular biology and evolution.

[45]  A. Horák,et al.  A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids , 2010, Proceedings of the National Academy of Sciences.

[46]  A. Seliverstov,et al.  Comparative Analysis of Apicoplast-Targeted Protein Extension Lengths in Apicomplexan Parasites , 2015, BioMed research international.

[47]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[48]  J. Lawrence,et al.  Common themes in the genome strategies of pathogens. , 2005, Current opinion in genetics & development.

[49]  B. Faircloth,et al.  A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs) , 2012, PloS one.

[50]  Alexander V. Kel'manov,et al.  A 2-approximate algorithm to solve one problem of the family of disjoint vector subsets , 2014, Autom. Remote. Control..

[51]  Laura Baxter,et al.  Conserved Noncoding Sequences Highlight Shared Components of Regulatory Networks in Dicotyledonous Plants[W][OA] , 2012, Plant Cell.

[52]  Stijn van Dongen,et al.  Using MCL to extract clusters from networks. , 2012, Methods in molecular biology.

[53]  Michael Pheasant,et al.  Comparison of Ultra-Conserved Elements in Drosophilids and Vertebrates , 2013, PloS one.

[54]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[55]  Axel Visel,et al.  Deletion of Ultraconserved Elements Yields Viable Mice , 2007, PLoS biology.

[56]  Shelby L. Bidwell,et al.  Genome Sequence of Babesia bovis and Comparative Analysis of Apicomplexan Hemoprotozoa , 2007, PLoS pathogens.

[57]  Nicholas G. Crawford,et al.  More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs , 2012, Biology Letters.

[58]  B. Faircloth,et al.  Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. , 2013, Systematic biology.

[59]  C. Shyu,et al.  Long identical multispecies elements in plant and animal genomes , 2012, Proceedings of the National Academy of Sciences.

[60]  J. Lukeš,et al.  The Organellar Genomes of Chromera and Vitrella, the Phototrophic Relatives of Apicomplexan Parasites. , 2015, Annual review of microbiology.

[61]  Chi-Ren Shyu,et al.  Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals , 2005, BMC Bioinformatics.

[62]  Michael Pheasant,et al.  Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock. , 2008, Molecular biology and evolution.

[63]  A. Kel'manov,et al.  An FPTAS for a vector subset search problem , 2014 .

[64]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[65]  Laura Anderlucci,et al.  UCbase 2.0: ultraconserved sequences database (2014 update) , 2014, Database J. Biol. Databases Curation.

[66]  C. Omoto,et al.  Gregarina niphandrodes may Lack Both a Plastid Genome and Organelle , 2007, The Journal of eukaryotic microbiology.

[67]  Philippa Rhodes,et al.  ApiDB: integrated resources for the apicomplexan bioinformatics resource center , 2006, Nucleic Acids Res..

[68]  Sara M. Handy,et al.  Alveolate Phylogeny Inferred using Concatenated Ribosomal Proteins , 2011, The Journal of eukaryotic microbiology.

[69]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[70]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .