Variant-DB: A Tool for Efficiently Exploring Millions of Human Genetic Variants and Their Annotations

Next Generation Sequencing (NGS) allows sequencing of a human genome within hours, enabling large scale applications such as sequencing the genome of each patient in a clinical study. Each individual human genome has about 3.5 Million genetic differences to the so called reference genome, the consensus genome of a healthy human. These differences, called variants, determine individual phenotypes, and certain variants are known to indicate disease predispositions. Finding associations from variant patterns and affected genes to these diseases requires combined analysis of variants from multiple individuals and hence, efficient solutions for accessing and filtering the variant data. We present Variant-DB, our in-house database solution that allows such efficient access to millions of variants from hundreds to thousands of individuals. Variant-DB stores individual variant genotypes and annotations. It features a REST-API and a web-based front-end for filtering variants based on annotations, individuals, families and studies. We explain Variant-DB and its front-end and demonstrate how the Variant-DB API can be included in data integration workflows.

[1]  Vincenzo Bonifati,et al.  Parkinson's Disease: The LRRK2-G2019S mutation: opening a novel era in Parkinson's disease genetics , 2006, European Journal of Human Genetics.

[2]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[3]  Jeroen F. J. Laros,et al.  LOVD v.2.0: the next generation in gene variant databases , 2011, Human mutation.

[4]  Marek Ostaszewski,et al.  Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases , 2016, Big Data.

[5]  Samik Ghosh,et al.  Integrating Pathways of Parkinson's Disease in a Molecular Interaction Map , 2013, Molecular Neurobiology.

[6]  Piotr Gawron,et al.  MINERVA—a platform for visualization and curation of molecular interaction networks , 2016, npj Systems Biology and Applications.

[7]  Adriano Barbosa-Silva,et al.  SmartR: an open-source platform for interactive visual analytics for translational research data , 2017, Bioinform..

[8]  K. Okamura,et al.  Human genetic variation database, a reference database of genetic variations in the Japanese population , 2016, Journal of Human Genetics.

[9]  Geert Vandeweyer,et al.  Detection and interpretation of genomic structural variation in health and disease , 2013, Expert review of molecular diagnostics.

[10]  D. Karolchik,et al.  The UCSC Genome Browser database: 2016 update , 2015, bioRxiv.

[11]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[12]  Matthias Mann,et al.  Phosphoproteomics reveals that Parkinson's disease kinase LRRK2 regulates a subset of Rab GTPases , 2016, eLife.

[13]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[14]  Rong Chen,et al.  DIVAS: a centralized genetic variant repository representing 150 000 individuals from multiple disease cohorts , 2015, Bioinform..

[15]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[16]  Rong Chen,et al.  Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts , 2016, BMC Bioinformatics.

[17]  Yike Guo,et al.  tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.