Genome Variation Map: a worldwide collection of genome variations across multiple species

Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. It aims to collect and integrate genome variations for a wide range of species, accepts submissions of different variation types from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Compared with the previous version, particularly, a total of 22 species, 115 projects, 55 935 samples, 463 429 609 variants, 66 220 associations and 56 submissions (as of 7 September 2020) were newly added in the current version of GVM. In the current release, GVM houses a total of ∼960 million variants from 41 species, including 13 animals, 25 plants and 3 viruses. Moreover, it incorporates 64 819 individual genotypes and 260 393 manually curated high-quality genotype-to-phenotype associations. Since its inception, GVM has archived genomic variation data of 43 754 samples submitted by worldwide users and served >1 million data download requests. Collectively, as a core resource in the National Genomics Data Center, GVM provides valuable genome variations for a diversity of species and thus plays an important role in both functional genomics studies and molecular breeding.

[1]  Mapping regulatory variants controlling gene expression in drought response and tolerance in maize , 2020, Genome Biology.

[2]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[3]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[4]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[5]  Jun Yan,et al.  SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice , 2020, Genom. Proteom. Bioinform..

[6]  Q. Gao,et al.  An intercross population study reveals genes associated with body size and plumage color in ducks , 2018, Nature Communications.

[7]  Yanbo Yang,et al.  Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation , 2019, Nucleic Acids Res..

[8]  M. Zhang,et al.  Pan-Genome of Wild and Cultivated Soybeans , 2020, Cell.

[9]  Inna Dubchak,et al.  The genome portal of the Department of Energy Joint Genome Institute: 2014 updates , 2013, Nucleic Acids Res..

[10]  L. Wang,et al.  Genomic variation in Pekin duck populations developed in three different countries as revealed by whole-genome data. , 2018, Animal genetics.

[11]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[12]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[13]  Zhang Zhang,et al.  Database Resources of the National Genomics Data Center in 2020 , 2019, Nucleic Acids Res..

[14]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[15]  Kenneth L. McNally,et al.  Genomic variation in 3,010 diverse accessions of Asian cultivated rice , 2018, Nature.

[16]  Zhang Zhang,et al.  GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals , 2019, Nucleic Acids Res..

[17]  Zhang Zhang,et al.  The 2019 novel coronavirus resource. , 2020, Yi chuan = Hereditas.

[18]  Lin Liu,et al.  IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data , 2020, Genom. Proteom. Bioinform..

[19]  Qian Zhang,et al.  GSA: Genome Sequence Archive* , 2017, Genom. Proteom. Bioinform..

[20]  Jun Yu,et al.  The Elements of Data Sharing , 2020, Genomics, Proteomics & Bioinformatics.

[21]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[22]  Hector Sanchez-Villeda,et al.  MaizeDB – A Functional Genomics Perspective , 2002, Comparative and functional genomics.

[23]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[24]  Z. Wang,et al.  Inferring the population history of Tai-Kadai-speaking people and southernmost Han Chinese on Hainan Island by genome-wide array genotyping , 2020, European Journal of Human Genetics.

[25]  Jingfa Xiao,et al.  Genome Variation Map: a data repository of genome variations in BIG Data Center , 2017, Nucleic Acids Res..

[26]  Xiuxiu Li,et al.  MBKbase for rice: an integrated omics knowledgebase for molecular breeding in rice , 2019, Nucleic Acids Res..

[27]  The Uniprot Consortium UniProt: the universal protein knowledgebase , 2018, Nucleic acids research.

[28]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[29]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[30]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[31]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..