IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data

Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.

[1]  Kui Lin,et al.  RNA-Seq improves annotation of protein-coding genes in the cucumber genome , 2011, BMC Genomics.

[2]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[3]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[4]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..

[5]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[6]  Qian Zhang,et al.  GSA: Genome Sequence Archive* , 2017, Genom. Proteom. Bioinform..

[7]  Yang Zhang,et al.  ICG: a wiki-driven knowledgebase of internal control genes for RT-qPCR normalization , 2017, Nucleic Acids Res..

[8]  Peng Cui,et al.  Translational selection in human: more pronounced in housekeeping genes , 2014, Biology Direct.

[9]  Zhongchi Liu,et al.  Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina- and SMRT-based RNA-seq datasets , 2017, DNA research : an international journal for rapid publication of reports on genes and genomes.

[10]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[11]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[12]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[13]  Songnian Hu,et al.  Rice Expression Database (RED): An integrated RNA-Seq-derived gene expression database for rice. , 2017, Journal of genetics and genomics = Yi chuan xue bao.

[14]  Yoshihiro Kawahara,et al.  The Rice Annotation Project Database (RAP-DB): 2008 update , 2007, Nucleic Acids Res..

[15]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[16]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  Yasuyuki Fujii,et al.  The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information , 2005, Nucleic Acids Res..

[19]  Zhang Zhang,et al.  Database Resources of the National Genomics Data Center in 2020 , 2019, Nucleic Acids Res..

[20]  Yang Zhang,et al.  Database Resources of the BIG Data Center in 2018 , 2017, Nucleic Acids Res..

[21]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[22]  Takuji Sasaki,et al.  Physical mapping of the rice genome with YAC clones , 1997, Plant Molecular Biology.

[23]  Vladimir B. Bajic,et al.  Characterization and identification of long non-coding RNAs based on feature relationship , 2019, Bioinform..

[24]  F. Thibaud-Nissen,et al.  Araport11: a complete reannotation of the Arabidopsis thaliana reference genome , 2016, bioRxiv.

[25]  Zhang Zhang,et al.  Bringing Biocuration to China , 2014, Genom. Proteom. Bioinform..

[26]  Xue Liu,et al.  Long Non-coding RNAs and Their Biological Roles in Plants , 2015, Genom. Proteom. Bioinform..

[27]  Doron Lancet,et al.  Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification , 2005, Bioinform..

[28]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[29]  S. Goff,et al.  Rice as a model for cereal genomics. , 1999, Current opinion in plant biology.

[30]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[31]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[32]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[33]  Zhang Zhang,et al.  Information Commons for Rice (IC4R) , 2015, Nucleic Acids Res..

[34]  Jun Yu,et al.  RiceWiki: a wiki-based database for community curation of rice genes , 2013, Nucleic Acids Res..

[35]  Jingchu Luo GSA and BIGD: Filling the Gap of Bioinformatics Resource and Service in China* , 2017, Genom. Proteom. Bioinform..

[36]  F. Zhao,et al.  CIRI: an efficient and unbiased algorithm for de novo circular RNA identification , 2015, Genome Biology.

[37]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..