The BIG Data Center: from deposition to integration to translation

Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn.

[1]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[2]  John M. Shelton,et al.  A Micropeptide Encoded by a Putative Long Noncoding RNA Regulates Muscle Performance , 2015, Cell.

[3]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[4]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[5]  Stephen C. Cannon,et al.  A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle , 2016, Science.

[6]  A. Fujiyama,et al.  A map of rice genome variation reveals the origin of cultivated rice , 2012, Nature.

[7]  Qian Qian,et al.  Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm , 2011, Nature Genetics.

[8]  Jun Yu,et al.  MTD: a mammalian transcriptomic database to explore gene expression and regulation , 2016, Briefings Bioinform..

[9]  rice genomes The 3,000 rice genomes project , 2014, GigaScience.

[10]  Jing Zhang,et al.  Sperm, but Not Oocyte, DNA Methylome Is Inherited by Zebrafish Early Embryos , 2013, Cell.

[11]  Kei-Hoi Cheung,et al.  Data Integration in Bioinformatics: Current Efforts and Challenges , 2011 .

[12]  Jun Li,et al.  Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum , 2013, Nature Communications.

[13]  Ilan Gronau,et al.  Genome Sequencing Highlights the Dynamic Early History of Dogs , 2014, PLoS genetics.

[14]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[15]  Ewa Deelman,et al.  New tools and methods for direct programmatic access to the dbSNP relational database , 2010, Nucleic Acids Res..

[16]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[17]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[18]  Lu Wang,et al.  DoGSD: the dog and wolf genome SNP database , 2014, Nucleic Acids Res..

[19]  Wanneng Yang,et al.  RiceVarMap: a comprehensive database of rice genomic variations , 2014, Nucleic Acids Res..

[20]  Weixiong Zhang,et al.  Cassava genome from a wild ancestor to cultivated varieties , 2014, Nature Communications.

[21]  Jingfa Xiao,et al.  VCGDB: a dynamic genome database of the Chinese population , 2014, BMC Genomics.

[22]  Dawei Li,et al.  A Draft Sequence for the Genome of the Domesticated Silkworm ( Bombyx mori ) , 2004 .

[23]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[24]  Jun Yu,et al.  LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs , 2014, Nucleic Acids Res..

[25]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[26]  Qifa Zhang,et al.  Genome-wide association studies of 14 agronomic traits in rice landraces , 2010, Nature Genetics.

[27]  Chaochun Wei,et al.  Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia , 2014, Genome research.

[28]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[29]  Yan Xia,et al.  SorGSD: a sorghum genome SNP database , 2016, Biotechnology for Biofuels.

[30]  Qiang Li,et al.  Genome sequence and genetic diversity of the common carp, Cyprinus carpio , 2014, Nature Genetics.

[31]  Zhang Zhang,et al.  Information Commons for Rice (IC4R) , 2015, Nucleic Acids Res..

[32]  Jun Yu,et al.  RiceWiki: a wiki-based database for community curation of rice genes , 2013, Nucleic Acids Res..

[33]  Daisuke Kihara,et al.  The Emerging World of Wikis , 2008, Science.

[34]  Lin Dai,et al.  AuthorReward: increasing community curation in biological knowledge wikis through automated authorship quantification , 2013, Bioinform..

[35]  Jing Zhang,et al.  MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data , 2014, Nucleic Acids Res..

[36]  J. Vandesompele,et al.  An update on LNCipedia: a database for annotated human lncRNA sequences , 2015, Nucleic Acids Res..

[37]  Jun Yu,et al.  Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis , 2013, PloS one.

[38]  S. Ramachandran,et al.  Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor) , 2011, Genome Biology.

[39]  Qiang Lin,et al.  Genome sequence of the date palm Phoenix dactylifera L , 2013, Nature Communications.

[40]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[41]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[42]  Monya Baker Databases fight funding cuts , 2012, Nature.

[43]  Hsien-Da Huang,et al.  RNAcentral: an international database of ncRNA sequences , 2014, Nucleic Acids Res..

[44]  Songnian Hu,et al.  The rubber tree genome reveals new insights into rubber production and species adaptation , 2016, Nature Plants.

[45]  Chung-I Wu,et al.  The genomics of selection in dogs and the parallel evolution between dogs and humans , 2013, Nature Communications.

[46]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[47]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[48]  Zhang Zhang,et al.  Bringing Biocuration to China , 2014, Genom. Proteom. Bioinform..

[49]  Wei Li,et al.  Programming and Inheritance of Parental DNA Methylomes in Mammals , 2014, Cell.

[50]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.