Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data

Gene model annotations are important community resources that ensure comparability and reproducibility of analyses and are typically the first step for functional annotation of genomic regions. Without up-to-date genome annotations, genome sequences cannot be used to maximum advantage. It is therefore essential to regularly update gene annotations by integrating the latest information to guarantee that reference annotations can remain a common basis for various types of analyses. Here, we report an improvement of the Arabidopsis lyrata gene annotation using extensive RNA-seq data. This new annotation consists of 31,132 protein coding gene models in addition to 2,089 genes with high similarity to transposable elements. Overall, ~87% of the gene models are corroborated by evidence of expression and 2,235 of these models feature multiple transcripts. Our updated gene annotation corrects hundreds of incorrectly split or merged gene models in the original annotation, and as a result the identification of alternative splicing events and differential isoform usage are vastly improved.

[1]  M. Zytnicki,et al.  Genome expansion of Arabis alpina linked with retrotransposition and reduced symmetric DNA methylation , 2015, Nature Plants.

[2]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[3]  Stefan R. Henz,et al.  Reference-guided assembly of four diverse Arabidopsis thaliana genomes , 2011, Proceedings of the National Academy of Sciences.

[4]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[5]  Ilia J Leitch,et al.  The dynamic ups and downs of genome size evolution in Brassicaceae. , 2008, Molecular biology and evolution.

[6]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[7]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[8]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[9]  Jun Wang,et al.  Insights into salt tolerance from the genome of Thellungiella salsuginea , 2012, Proceedings of the National Academy of Sciences.

[10]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[11]  Kun Lu,et al.  The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes , 2014, Nature Communications.

[12]  C. Sullivan,et al.  MicroRNA Gene Evolution in Arabidopsis lyrata and Arabidopsis thaliana[W][OA] , 2010, Plant Cell.

[13]  H. Bohnert,et al.  Genome Structures and Transcriptomes Signify Niche Adaptation for the Multiple-Ion-Tolerant Extremophyte Schrenkiella parvula1[C][W][OPEN] , 2014, Plant Physiology.

[14]  H. Bohnert,et al.  The genome of the extremophile crucifer Thellungiella parvula , 2011, Nature Genetics.

[15]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[16]  S. Manel,et al.  Development of an Arabis alpina genomic contig sequence data set and application to single nucleotide polymorphisms discovery , 2014, Molecular ecology resources.

[17]  Z. Chen,et al.  Evolution of genome size in Brassicaceae. , 2005, Annals of botany.

[18]  J. Poulain,et al.  The genome of the mesopolyploid crop species Brassica rapa , 2011, Nature Genetics.

[19]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[20]  Simon Prochnik,et al.  The Reference Genome of the Halophytic Plant Eutrema salsugineum , 2013, Front. Plant Sci..

[21]  Alan M. Moses,et al.  An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions , 2013, Nature Genetics.

[22]  Kui Lin,et al.  RNA-Seq improves annotation of protein-coding genes in the cucumber genome , 2011, BMC Genomics.

[23]  T. Nishio,et al.  Draft Sequences of the Radish (Raphanus sativus L.) Genome , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[24]  Karsten M. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[25]  Mathieu Blanchette,et al.  The Capsella rubella genome and the genomic consequences of rapid mating system evolution , 2013, Nature Genetics.

[26]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[27]  I. Hellmann,et al.  Massive genomic variation and strong selection in Arabidopsis thaliana lines from Sweden , 2013, Nature Genetics.

[28]  Henry D. Priest,et al.  Genome-wide mapping of alternative splicing in Arabidopsis thaliana. , 2010, Genome research.

[29]  Juw Won Park,et al.  MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data , 2012, Nucleic acids research.

[30]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[31]  Xun Xu,et al.  The Tarenaya hassleriana Genome Provides Insight into Reproductive Trait and Genome Evolution of Crucifers[W][OPEN] , 2013, Plant Cell.

[32]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[33]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[34]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[35]  D. Weigel,et al.  Evolution of DNA Methylation Patterns in the Brassicaceae is Driven by Differences in Genome Organization , 2014, PLoS genetics.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.