SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa

Abstract The 3000 Rice Genomes Project generated a large dataset of genomic variation to the world’s most important crop, Oryza sativa L. Using the Burrows-Wheeler Aligner (BWA) and the Genome Analysis Toolkit (GATK) variant calling on this dataset, we identified ∼40 M single-nucleotide polymorphisms (SNPs). Five reference genomes of rice representing the major variety groups were used: Nipponbare (temperate japonica ), IR 64 ( indica ), 93–11 ( indica ), DJ 123 ( aus ), and Kasalath ( aus ). The results are accessible through the Rice SNP-Seek Database ( http://snp-seek.irri.org ) and through web services of the application programming interface (API). We incorporated legacy phenotypic and passport data for the sequenced varieties originating from the International Rice Genebank Collection Information System (IRGCIS) and gene models from several rice annotation projects. The massive genotypic data in SNP-Seek are stored using hierarchical data format 5 (HDF5) files for quick retrieval. Germplasm, phenotypic, and genomic data are stored in a relational database management system (RDBMS) using the Chado schema, allowing the use of controlled vocabularies from biological ontologies as query constraints in SNP-Seek. In this paper, we discuss the datasets stored in SNP-Seek, architecture of the database and web application, interoperability methodologies in place, and discuss a few use cases demonstrating the utility of SNP-Seek for diversity analysis and molecular breeding.

[1]  Lincoln Stein,et al.  Gramene 2016: comparative plant genomics and pathway resources , 2015, Nucleic Acids Res..

[2]  Gerd Heber,et al.  An overview of the HDF5 technology suite and its applications , 2011, AD '11.

[3]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[4]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[5]  rice genomes The 3,000 rice genomes project , 2014, GigaScience.

[6]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[7]  M. Yano,et al.  Q-TARO: QTL Annotation Rice Online Database , 2010, Rice.

[8]  Michael T. Jackson,et al.  Conservation of rice genetic resources: the role of the International Rice Genebank at IRRI , 1997, Plant Molecular Biology.

[9]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[10]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[11]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[12]  Karen Eilbeck,et al.  Evolution of the Sequence Ontology terms and relationships , 2009, J. Biomed. Informatics.

[13]  Doreen Ware,et al.  Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica , 2014, Genome Biology.

[14]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[15]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[16]  Eiji Yamamoto,et al.  OGRO: The Overview of functionally characterized Genes in Rice online database , 2012, Rice.

[17]  M. Wopereis,et al.  Crops that feed the world 7: Rice , 2012, Food Security.

[18]  Ming Chen,et al.  PRIN: a predicted rice interactome network , 2011, BMC Bioinformatics.

[19]  Inna Dubchak,et al.  Multiple whole-genome alignments without a reference organism. , 2009, Genome research.

[20]  L. Stein,et al.  Plant Ontology (PO): a Controlled Vocabulary of Plant Structures and Growth Stages , 2005, Comparative and functional genomics.

[21]  Lior Pachter,et al.  VISTA: computational tools for comparative genomics , 2004, Nucleic Acids Res..

[22]  Hyojin Kim,et al.  RiceNet v2: an improved network prioritization server for rice genes , 2015, Nucleic Acids Res..

[23]  Samuel A. Smits,et al.  jsPhyloSVG: A Javascript Library for Visualizing Interactive and Vector-Based Phylogenetic Trees on the Web , 2010, PloS one.

[24]  Hiroaki Sakai,et al.  Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[25]  S. Heuer,et al.  Developing Rice with High Yield under Phosphorus Deficiency: Pup1 Sequence to Application1[W][OA] , 2011, Plant Physiology.

[26]  John M. Hancock,et al.  PlantProm: a database of plant promoter sequences , 2003, Nucleic Acids Res..

[27]  Jian Wang,et al.  Dissecting yield-associated loci in super hybrid rice by resequencing recombinant inbred lines and improving parental genome sequences , 2013, Proceedings of the National Academy of Sciences.

[28]  M. Yano,et al.  Substitution mapping of Pup1: a major QTL increasing phosphorus uptake of rice from a phosphorus-deficient soil , 2002, Theoretical and Applied Genetics.

[29]  Y. Yamazaki,et al.  Oryzabase. An Integrated Biological and Genome Information Database for Rice1[OA] , 2005, Plant Physiology.

[30]  M. Yano,et al.  Mapping of QTLs for phosphorus-deficiency tolerance in rice (Oryza sativa L.) , 1998, Theoretical and Applied Genetics.

[31]  Zhang Zhang,et al.  Information Commons for Rice (IC4R) , 2015, Nucleic Acids Res..

[32]  Wensheng Wang,et al.  SNP-Seek database of SNPs derived from 3000 rice genomes , 2014, Nucleic Acids Res..

[33]  Yoshihiro Kawahara,et al.  Rice Annotation Project Database (RAP-DB): An Integrative and Interactive Database for Rice Genomics , 2013, Plant & cell physiology.

[34]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[35]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[36]  Kenneth L. McNally,et al.  Allele mining and enhanced genetic recombination for rice breeding , 2015, Rice.

[37]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[38]  Lizhong Xiong,et al.  Genetic engineering and breeding of drought-resistant crops. , 2014, Annual review of plant biology.

[39]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[40]  Jianliang Huang,et al.  Disease resistance in rice and the role of molecular breeding in protecting rice crops against diseases , 2014, Biotechnology Letters.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  N. Morrison,et al.  Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature , 2010, AoB PLANTS.

[43]  Wanneng Yang,et al.  RiceVarMap: a comprehensive database of rice genomic variations , 2014, Nucleic Acids Res..

[44]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[45]  Chaochun Wei,et al.  Rice functional genomics and breeding database (RFGB)-3K-rice SNP and InDel sub-database , 2015 .

[46]  Graham McLaren,et al.  Towards a Reference Plant Trait Ontology for Modeling Knowledge of Plant Traits and Phenotypes , 2012, KEOD.