Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection

BackgroundNext generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.DescriptionWe have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.ConclusionHuvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.

[1]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[2]  Stephan Züchner,et al.  Exome sequencing allows for rapid gene identification in a Charcot‐Marie‐Tooth family , 2011, Annals of neurology.

[3]  Sven Kreiborg,et al.  Inactivation of IL11 signaling causes craniosynostosis, delayed tooth eruption, and supernumerary teeth. , 2011, American journal of human genetics.

[4]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[5]  Christopher Phillips,et al.  ENGINES: exploring single nucleotide variation in entire human genomes , 2011, BMC Bioinformatics.

[6]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[7]  J. Long,et al.  Exome sequencing generates high quality data in non-target regions , 2012, BMC Genomics.

[8]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[9]  Jessica C. Ebert,et al.  Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads , 2012, J. Comput. Biol..

[10]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[11]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[12]  Joshua D. Starmer,et al.  AWclust: point-and-click software for non-parametric population structure analysis , 2008, BMC Bioinformatics.

[13]  Matthew Mort,et al.  The Human Gene Mutation Database: providing a comprehensive central mutation database for molecular diagnostics and personalised genomics , 2009, Human Genomics.

[14]  Hugo Y. K. Lam,et al.  Performance comparison of exome DNA sequencing technologies , 2011, Nature Biotechnology.

[15]  H. Ropers,et al.  On the future of genetic risk assessment , 2012, Journal of Community Genetics.

[16]  Arshad Khan,et al.  SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms , 2008, Bioinform..

[17]  J. Stephenson 1000 Genomes Project , 2008 .

[18]  P. Shannon,et al.  Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing , 2010, Science.

[19]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[20]  Zhongming Zhao,et al.  NGS catalog: A database of next generation sequencing studies in humans , 2012, Human mutation.

[21]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[22]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[23]  Hugo A. Katus,et al.  Targeted Next-Generation Sequencing for the Molecular Genetic Diagnostics of Cardiomyopathies , 2011, Circulation. Cardiovascular genetics.

[24]  Qianqian Zhu,et al.  A genome-wide comparison of the functional properties of rare and common genetic variants in humans. , 2011, American journal of human genetics.

[25]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.