Conservation and functional element discovery in 20 angiosperm plant genomes.

Here, we describe the construction of a phylogenetically deep, whole-genome alignment of 20 flowering plants, along with an analysis of plant genome conservation. Each included angiosperm genome was aligned to a reference genome, Arabidopsis thaliana, using the LASTZ/MULTIZ paradigm and tools from the University of California-Santa Cruz Genome Browser source code. In addition to the multiple alignment, we created a local genome browser displaying multiple tracks of newly generated genome annotation, as well as annotation sourced from published data of other research groups. An investigation into A. thaliana gene features present in the aligned A. lyrata genome revealed better conservation of start codons, stop codons, and splice sites within our alignments (51% of features from A. thaliana conserved without interruption in A. lyrata) when compared with previous publicly available plant pairwise alignments (34% of features conserved). The detailed view of conservation across angiosperms revealed not only high coding-sequence conservation but also a large set of previously uncharacterized intergenic conservation. From this, we annotated the collection of conserved features, revealing dozens of putative noncoding RNAs, including some with recorded small RNA expression. Comparing conservation between kingdoms revealed a faster decay of vertebrate genome features when compared with angiosperm genomes. Finally, conserved sequences were searched for folding RNA features, including but not limited to noncoding RNA (ncRNA) genes. Among these, we highlight a double hairpin in the 5'-untranslated region (5'-UTR) of the PRIN2 gene and a putative ncRNA with homology targeting the LAF3 protein.

[1]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[2]  Mei Chen,et al.  Homology in accessory proteins of replicative polymerases--E. coli to humans , 1993, Nucleic Acids Res..

[3]  Ying Wang,et al.  Sequencing and Comparative Analysis of a Conserved Syntenic Segment in the Solanaceae , 2008, Genetics.

[4]  B. Haas,et al.  Draft genome sequence of the oilseed species Ricinus communis , 2010, Nature Biotechnology.

[5]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[6]  A. Kern,et al.  Computational analysis and characterization of UCE-like elements (ULEs) in plant genomes , 2012, Genome research.

[7]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[8]  David M. A. Martin,et al.  Genome sequence and analysis of the tuber crop potato , 2011, Nature.

[9]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[10]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[11]  Adam C. Siepel,et al.  PHAST and RPHAST: phylogenetic analysis with space/time models , 2011, Briefings Bioinform..

[12]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[13]  R. Schmidt,et al.  Comparative genome analysis reveals extensive conservation of genome organisation for Arabidopsis thaliana and Capsella rubella. , 2000, The Plant journal : for cell and molecular biology.

[14]  Inna Dubchak,et al.  Multiple whole genome alignments and novel biomedical applications at the VISTA portal , 2007, Nucleic Acids Res..

[15]  V. Cognat,et al.  A global picture of tRNA genes in plant genomes. , 2011, The Plant journal : for cell and molecular biology.

[16]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[17]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[18]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[19]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[20]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[21]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[22]  Abdelali Barakat,et al.  Comparative mapping between potato (Solanum tuberosum) and Arabidopsis thaliana reveals structurally conserved domains and ancient duplications in the potato genome. , 2003, The Plant journal : for cell and molecular biology.

[23]  S. Goff,et al.  Utility and distribution of conserved noncoding sequences in the grasses , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[25]  S. Moose,et al.  Conserved Noncoding Sequences among Cultivated Cereal Genomes Identify Candidate Regulatory Sequence Elements and Patterns of Promoter Evolution Online version contains Web-only data. Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010181. , 2003, The Plant Cell Online.

[26]  Asan,et al.  The genome of the cucumber, Cucumis sativus L. , 2009, Nature Genetics.

[27]  W. L. Ruzzo,et al.  De novo prediction of structured RNAs from genomic sequences. , 2010, Trends in biotechnology.

[28]  N. Chua,et al.  LAF3, a Novel Factor Required for Normal Phytochrome A Signaling1[w] , 2003, Plant Physiology.

[29]  Michael J. E. Sternberg,et al.  Secondary structure prediction: Current Opinion in Structural Biology 1992, 2:237–241 , 1992 .

[30]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[31]  N. Stojanovic A study of the distribution of phylogenetically conserved blocks within clusters of mammalian homeobox genes , 2009, Genetics and molecular biology.

[32]  Obi L. Griffith,et al.  ORegAnno: an open-access community-driven resource for regulatory annotation , 2007, Nucleic Acids Res..

[33]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.

[34]  T. Kleine,et al.  The plastid redox insensitive 2 mutant of Arabidopsis is impaired in PEP activity and high light-dependent plastid redox signalling to the nucleus. , 2012, The Plant journal : for cell and molecular biology.

[35]  Brian C. Thomas,et al.  Arabidopsis intragenomic conserved noncoding sequence , 2007, Proceedings of the National Academy of Sciences.

[36]  Pamela S Soltis,et al.  Darwin's abominable mystery: Insights from a supertree of the angiosperms , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  James E. Johnson,et al.  Legume Resources: MtDB and Medicago.Org. , 2007, Methods in molecular biology.

[38]  B. Haas,et al.  Identification and Characterization of Lineage-Specific Genes within the Poaceae1[W][OA] , 2007, Plant Physiology.

[39]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[40]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[41]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[42]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[43]  Gunnar Rätsch,et al.  Stress-induced changes in the Arabidopsis thaliana transcriptome analyzed using whole-genome tiling arrays. , 2009, The Plant journal : for cell and molecular biology.

[44]  Edward S. Buckler,et al.  Crop genomics: advances and applications , 2011, Nature Reviews Genetics.

[45]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[46]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[47]  M. Freeling,et al.  Conserved noncoding sequences (CNSs) in higher plants. , 2009, Current opinion in plant biology.

[48]  A. Bashir,et al.  Conserved noncoding sequences in the grasses. , 2003, Genome research.

[49]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[50]  Haibao Tang,et al.  Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. , 2008, Genome research.

[51]  Wen-Hsiung Li,et al.  Rates of Nucleotide Substitution in Angiosperm Mitochondrial DNA Sequences and Dates of Divergence Between Brassica and Other Angiosperm Lineages , 1999, Journal of Molecular Evolution.

[52]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[53]  Ultraconserved Elements Between the Genomes of the Plants Arabidopsis thaliana and Rice , 2008, Journal of biomolecular structure & dynamics.

[54]  S. Sultan Phenotypic plasticity for plant development, function and life history. , 2000, Trends in plant science.

[55]  J. Hirschhorn,et al.  Ultraconserved Elements: Analyses of Dosage Sensitivity, Motifs and Boundaries , 2008, Genetics.

[56]  Henry D. Priest,et al.  The genome of woodland strawberry (Fragaria vesca) , 2011, Nature Genetics.

[57]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[58]  E. Pichersky,et al.  An evolutionarily conserved protein binding sequence upstream of a plant light-regulated gene. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Matthew D. Wilkerson,et al.  PlantGDB: a resource for comparative plant genomics , 2007, Nucleic Acids Res..

[60]  Baohong Zhang,et al.  Conservation and divergence of plant microRNA genes. , 2006, The Plant journal : for cell and molecular biology.

[61]  T. Tschaplinski,et al.  Genome-wide Identification of Lineage-specific Genes in Arabidopsis, Oryza and Populus , 2022 .

[62]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[63]  P. May,et al.  Comparative analysis of miRNAs and their targets across four plant species , 2011, BMC Research Notes.

[64]  H. Mori,et al.  Genome Structure of the Legume, Lotus japonicus , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[65]  J. Poulain,et al.  The genome of Theobroma cacao , 2011, Nature Genetics.

[66]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[67]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[68]  Saurabh Sinha,et al.  Evolution of Regulatory Sequences in 12 Drosophila Species , 2009, PLoS genetics.

[69]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[70]  Tyler W. H. Backman,et al.  Update of ASRP: the Arabidopsis Small RNA Project database , 2007, Nucleic Acids Res..

[71]  Matthew E. Hudson,et al.  Identification of Promoter Motifs Involved in the Network of Phytochrome A-Regulated Gene Expression by Combined Analysis of Genomic Sequence and Microarray Data1[w] , 2003, Plant Physiology.

[72]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[73]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[74]  Jean-Michel Claverie,et al.  FusionDB: a database for in-depth analysis of prokaryotic gene fusion events , 2004, Nucleic Acids Res..

[75]  Lonnie R. Welch,et al.  AGRIS: the Arabidopsis Gene Regulatory Information Server, an update , 2010, Nucleic Acids Res..

[76]  Roger E Bumgarner,et al.  The genome of the domesticated apple (Malus × domestica Borkh.) , 2010, Nature Genetics.

[77]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[78]  K. Mayer,et al.  Discovery of cis-elements between sorghum and rice using co-expression and evolutionary conservation , 2009, BMC Genomics.

[79]  S. Tanksley,et al.  Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[80]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[81]  Gill Bejerano,et al.  Ultraconserved elements in insect genomes: a highly conserved intronic sequence implicated in the control of homothorax mRNA splicing. , 2005, Genome research.

[82]  Dustin A. Cartwright,et al.  A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety , 2007, PloS one.

[83]  Andrew H. Paterson,et al.  Synteny and Collinearity in Plant Genomes , 2008, Science.

[84]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[86]  David Haussler,et al.  Human Genome Ultraconserved Elements Are Ultraselected , 2007, Science.

[87]  J. Poulain,et al.  The genome of the mesopolyploid crop species Brassica rapa , 2011, Nature Genetics.

[88]  Sai Guna Ranjan Gurazada,et al.  Genome sequencing and analysis of the model grass Brachypodium distachyon , 2010, Nature.

[89]  J. Schmutz,et al.  Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome , 2010, Proceedings of the National Academy of Sciences.

[90]  R. Sommer,et al.  Hormone Signaling and Phenotypic Plasticity in Nematode Development and Evolution , 2011, Current Biology.