A web-server framework to explore and visualize large genomic variation data in lab and its applications to wheat and its progenitors

Background The cost of high-throughput sequencing is rapidly decreasing, allowing researchers to investigate genomic variations across hundreds or even thousands of samples in the post-genomic era. The management and exploration of these large-scale genomic variation data require programming skills. The public genotype querying databases of many species are usually centralized and implemented independently, making them difficult to update with new data over time. Currently, there is a lack of a widely used framework for setting up user-friendly web servers for exploring new genomic variation data in diverse species. Results Here, we present SnpHub, a Shiny/R-based server framework for retrieving, analysing and visualizing the large-scale genomic variation data that be easily set up on any Linux server. After a pre-building process based on the provided VCF files and genome annotation files, the local server allows users to interactively access SNPs/INDELs and annotation information by locus or gene and for user-defined sample sets through a webpage. Users can freely analyse and visualize genomic variations in heatmaps, phylogenetic trees, haplotype networks, or geographical maps. Sample-specific sequences can be accessed as replaced by SNPs/INDELs. Conclusions SnpHub can be applied to any species, and we build up a SnpHub portal website for wheat and its progenitors based on published data in present studies. SnpHub and its tutorial are available as http://guoweilong.github.io/SnpHub/.

[1]  Vijay Tiwari,et al.  Genomic Analysis Confirms Population Structure and Identifies Inter-Lineage Hybrids in Aegilops tauschii , 2019, Front. Plant Sci..

[2]  Joana Damas,et al.  A near-chromosome-scale genome assembly of the gemsbok (Oryx gazella): an iconic antelope of the Kalahari desert , 2019, GigaScience.

[3]  Manuel Ruiz,et al.  SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations , 2015, Nucleic Acids Res..

[4]  S. Dyer,et al.  Tracing the ancestry of modern bread wheats , 2019, Nature Genetics.

[5]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[6]  Inna Dubchak,et al.  SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa , 2016 .

[7]  K. Yelick,et al.  A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome , 2015, Genome Biology.

[8]  J. Frouin,et al.  Gigwa v2—Extended and improved genotype investigator , 2019, GigaScience.

[9]  Yan Li,et al.  SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation , 2016, PloS one.

[10]  Neil Hall,et al.  A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes , 2015, Genome Biology.

[11]  Ping Zhu,et al.  CGmapTools improves the precision of heterozygous SNV calls and supports allele‐specific methylation detection and visualization in bisulfite‐sequencing data , 2018, Bioinform..

[12]  Zhonghua Wang,et al.  Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat , 2019, Genome Biology.

[13]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[14]  Ulf Gyllensten,et al.  CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects , 2014, Database J. Biol. Databases Curation.

[15]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[16]  Zhang Zhang,et al.  Information Commons for Rice (IC4R) , 2015, Nucleic Acids Res..

[17]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[18]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[19]  Christine G. Elsik,et al.  MaizeGDB 2018: the maize multi-genome genetics and genomics database , 2018, Nucleic Acids Res..

[20]  Hui Xiang,et al.  Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean , 2015, Nature Biotechnology.

[21]  Emmanuel Paradis,et al.  pegas: an R package for population genetics with an integrated-modular approach , 2010, Bioinform..

[22]  Asan,et al.  Altitude adaptation in Tibet caused by introgression of Denisovan-like DNA , 2014, Nature.

[23]  G. Spangenberg,et al.  Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome , 2019, Nature Genetics.

[24]  Peter J. Bradbury,et al.  Maize HapMap2 identifies extant variation from a genome in flux , 2012, Nature Genetics.

[25]  J. Batley,et al.  A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome , 2014, Science.

[26]  Axel Himmelbach,et al.  Wild emmer genome architecture and diversity elucidate wheat evolution and domestication , 2017, Science.