The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology

Over the past 10 years, genomes of cultivated rice cultivars and their wild counterparts have been sequenced although most efforts are focused on genome assembly and annotation of two major cultivated rice (Oryza sativa L.) subspecies, 93-11 (indica) and Nipponbare (japonica). To integrate information from genome assemblies and annotations for better analysis and application, we now introduce a comparative rice genome database, the Rice Genome Knowledgebase (RGKbase, http://rgkbase.big.ac.cn/RGKbase/). RGKbase is built to have three major components: (i) integrated data curation for rice genomics and molecular biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature; (ii) User-friendly viewers, such as Gbrowse, GeneBrowse and Circos, for genome annotations and evolutionary dynamics and (iii) Bioinformatic tools for compositional and synteny analyses, gene family classifications, gene ontology terms and pathways and gene co-expression networks. RGKbase current includes data from five rice cultivars and species: Nipponbare (japonica), 93-11 (indica), PA64s (indica), the African rice (Oryza glaberrima) and a wild rice species (Oryza brachyantha). We are also constantly introducing new datasets from variety of public efforts, such as two recent releases—sequence data from ∼1000 rice varieties, which are mapped into the reference genome, yielding ample high-quality single-nucleotide polymorphisms and insertions–deletions.

[1]  Huanming Yang,et al.  Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. , 2010, Genome research.

[2]  David S. Goodsell,et al.  The RCSB Protein Data Bank: redesigned web site and web services , 2010, Nucleic Acids Res..

[3]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[4]  M. Tomita,et al.  OryzaPG-DB: Rice Proteome Database based on Shotgun Proteogenomics , 2011, BMC Plant Biology.

[5]  Michael R Brent,et al.  Using N‐SCAN or TWINSCAN to Predict Gene Structures in Genomic DNA Sequences , 2007, Current protocols in bioinformatics.

[6]  Mark H. Wright,et al.  Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa , 2011, Nature communications.

[7]  Andrea Zuccolo,et al.  RetrOryza: a database of the rice LTR-retrotransposons , 2006, Nucleic Acids Res..

[8]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[9]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[10]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[11]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[12]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[13]  Haiyan Zhang,et al.  ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. , 2012, Biochemical and biophysical research communications.

[14]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[15]  J. Prince,et al.  A Unique Set of 11,008 Onion Expressed Sequence Tags Reveals Expressed Sequence and Genomic Differences between the Monocot Orders Asparagales and Poales Online version contains Web-only data. Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.0 , 2004, The Plant Cell Online.

[16]  E. Birney,et al.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. , 2008, Genome research.

[17]  Maureen J Donlin,et al.  Using the Generic Genome Browser (GBrowse) , 2007, Current protocols in bioinformatics.

[18]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[19]  Jun Wang,et al.  Compositional gradients in Gramineae genes. , 2002, Genome research.

[20]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[21]  Qian Qian,et al.  Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm , 2011, Nature Genetics.

[22]  Ren Zhang,et al.  Isochore Structures in the Genome of the Plant Arabidopsis thaliana , 2004, Journal of Molecular Evolution.

[23]  Dawei Li,et al.  The Genomes of Oryza sativa: A History of Duplications , 2005, PLoS biology.

[24]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[25]  M. Yano,et al.  Q-TARO: QTL Annotation Rice Online Database , 2010, Rice.

[26]  Hongyu Zhao,et al.  A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies , 2009, Nature Genetics.

[27]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[28]  Y. Qi,et al.  Global Epigenetic and Transcriptional Trends among Two Rice Subspecies and Their Reciprocal Hybrids[W] , 2010, Plant Cell.

[29]  Robert Kofler,et al.  SciRoKo: a new tool for whole genome microsatellite search and investigation , 2007, Bioinform..

[30]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[31]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[32]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[33]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[34]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[35]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[36]  José Martínez-Aroza,et al.  CpGcluster: a distance-based algorithm for CpG-island detection , 2006, BMC Bioinformatics.

[37]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[38]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[39]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[40]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[41]  Michael Hackenberg,et al.  IsoFinder: computational prediction of isochores in genome sequences , 2004, Nucleic Acids Res..

[42]  Zhike Lu,et al.  A transcriptomic analysis of superhybrid rice LYP9 and its parents , 2009, Proceedings of the National Academy of Sciences.

[43]  Beat Keller,et al.  Ancestral genome duplication in rice. , 2004, Genome.

[44]  Anton J. Enright,et al.  MapMi: automated mapping of microRNA loci , 2010, BMC Bioinformatics.

[45]  Jun Yu,et al.  Nonsynonymous substitution rate (Ka) is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes , 2011, Biology Direct.

[46]  Feng Gao,et al.  GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences , 2006, Nucleic Acids Res..

[47]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[48]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[49]  Peter F. Stadler,et al.  SnoReport: computational identification of snoRNAs with unknown targets , 2008, Bioinform..

[50]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[51]  Tao Wang,et al.  PMRD: plant microRNA database , 2009, Nucleic Acids Res..

[52]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[53]  Liya Ren,et al.  Gramene QTL database: development, content and applications , 2009, Database J. Biol. Databases Curation.

[54]  T. Bureau,et al.  Survey of transposable elements from rice genomic sequences. , 2008, The Plant journal : for cell and molecular biology.

[55]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[56]  R. Meyers Encyclopedia of molecular cell biology and molecular medicine , 2014 .

[57]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[58]  Daoxiu Zhou,et al.  Heterosis and polymorphisms of gene expression in an elite rice hybrid as revealed by a microarray analysis of 9198 unique ESTs , 2006, Plant Molecular Biology.

[59]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[60]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[61]  Jian Wang,et al.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics , 2004, Nucleic Acids Res..

[62]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[63]  Maureen J Donlin,et al.  Using the Generic Genome Browser (GBrowse) , 2007, Current protocols in bioinformatics.

[64]  Liang Tang,et al.  PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database , 2010, Nucleic Acids Res..

[65]  Huanming Yang,et al.  Shotgun Sequencing (SGS) , 2006 .

[66]  Edward S. Buckler,et al.  Gramene database in 2010: updates and extensions , 2010, Nucleic Acids Res..

[67]  Jun Yu,et al.  LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes , 2011, Evolutionary bioinformatics online.

[68]  Birgit Kersten,et al.  PlnTFDB: updated content and new features of the plant transcription factor database , 2009, Nucleic Acids Res..

[69]  Jun Yu,et al.  KaKs_Calculator 2.0: A Toolkit Incorporating Gamma-Series Methods and Sliding Window Strategies , 2010, Genom. Proteom. Bioinform..

[70]  G. Gheysen,et al.  Transcriptome analysis of rice mature root tissue and root tips in early development by massive parallel sequencing. , 2012, Journal of experimental botany.

[71]  John M. Hancock,et al.  PlantProm: a database of plant promoter sequences , 2003, Nucleic Acids Res..

[72]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[73]  Songnian Hu,et al.  Proteomic profiling of rice embryos from a hybrid rice cultivar and its parental lines , 2008, Proteomics.