Burden Testing of Rare Variants Identified through Exome Sequencing via Publicly Available Control Data.

The genetic causes of many Mendelian disorders remain undefined. Factors such as lack of large multiplex families, locus heterogeneity, and incomplete penetrance hamper these efforts for many disorders. Previous work suggests that gene-based burden testing-where the aggregate burden of rare, protein-altering variants in each gene is compared between case and control subjects-might overcome some of these limitations. The increasing availability of large-scale public sequencing databases such as Genome Aggregation Database (gnomAD) can enable burden testing using these databases as controls, obviating the need for additional control sequencing for each study. However, there exist various challenges with using public databases as controls, including lack of individual-level data, differences in ancestry, and differences in sequencing platforms and data processing. To illustrate the approach of using public data as controls, we analyzed whole-exome sequencing data from 393 individuals with idiopathic hypogonadotropic hypogonadism (IHH), a rare disorder with significant locus heterogeneity and incomplete penetrance against control subjects from gnomAD (n = 123,136). We leveraged presumably benign synonymous variants to calibrate our approach. Through iterative analyses, we systematically addressed and overcame various sources of artifact that can arise when using public control data. In particular, we introduce an approach for highly adaptable variant quality filtering that leads to well-calibrated results. Our approach "re-discovered" genes previously implicated in IHH (FGFR1, TACR3, GNRHR). Furthermore, we identified a significant burden in TYRO3, a gene implicated in hypogonadotropic hypogonadism in mice. Finally, we developed a user-friendly software package TRAPD (Test Rare vAriants with Public Data) for performing gene-based burden testing against public databases.

[1]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[2]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[3]  Kyle J. Gaulton,et al.  The Power of Gene-Based Rare Variant Methods to Detect Disease-Associated Variation and Test Hypotheses About Complex Disease , 2015, PLoS genetics.

[4]  Frank Reimann,et al.  TAC3 and TACR3 mutations in familial hypogonadotropic hypogonadism reveal a key role for Neurokinin B in the central control of reproduction , 2009, Nature Genetics.

[5]  Brittany N. Lasseigne,et al.  Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways , 2015, Science.

[6]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[7]  Rany M Salem,et al.  Determinants of Power in Gene-Based Burden Testing for Monogenic Disorders. , 2016, American journal of human genetics.

[8]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[9]  M. Maghnie,et al.  Expert consensus document: European Consensus Statement on congenital hypogonadotropic hypogonadism—pathogenesis, diagnosis and treatment , 2015, Nature Reviews Endocrinology.

[10]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[11]  E. Fliers,et al.  Mutations in fibroblast growth factor receptor 1 cause Kallmann syndrome with a wide spectrum of reproductive phenotypes , 2006, Molecular and Cellular Endocrinology.

[12]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[13]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[14]  R. Hodges,et al.  Functional consequences of AXL sequence variants in hypogonadotropic hypogonadism. , 2014, The Journal of clinical endocrinology and metabolism.

[15]  D. Goldstein,et al.  Sequencing studies in human genetics: design and interpretation , 2013, Nature Reviews Genetics.

[16]  Adam Kiezun,et al.  Fine-Scale Patterns of Population Stratification Confound Rare Variant Association Tests , 2013, PloS one.

[17]  W. Crowley,et al.  Prevalence, phenotypic spectrum, and modes of inheritance of gonadotropin-releasing hormone receptor mutations in idiopathic hypogonadotropic hypogonadism. , 2001, The Journal of clinical endocrinology and metabolism.

[18]  Ibrahim Osman Adam,et al.  Ataxia, dementia, and hypogonadotropism caused by disordered ubiquitination. , 2013, The New England journal of medicine.

[19]  R. Doty,et al.  Development of the university of pennsylvania smell identification test: A standardized microencapsulated test of olfactory function , 1984, Physiology & Behavior.

[20]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[21]  M. Tervaniemi,et al.  Incidence, Phenotypic Features and Molecular Genetics of Kallmann Syndrome in Finland , 2011, Orphanet journal of rare diseases.

[22]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[23]  W. Crowley,et al.  Isolated GnRH deficiency: A disease model serving as a unique prism into the systems biology of the GnRH neuronal network , 2011, Molecular and Cellular Endocrinology.

[24]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[25]  Ryan L. Collins,et al.  SMCHD1 mutations associated with a rare muscular dystrophy can also cause isolated arhinia and Bosma arhinia microphthalmia syndrome , 2017, Nature Genetics.

[26]  C. Petit,et al.  Kallmann Syndrome: Mutations in the Genes Encoding Prokineticin-2 and Prokineticin Receptor-2 , 2006, PLoS genetics.

[27]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[28]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[29]  D. Goldstein,et al.  Whole Exome Sequencing in 20,197 Persons for Rare Variants in Alzheimer Disease , 2018, bioRxiv.

[30]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[31]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[32]  R. Quinton,et al.  TAC3/TACR3 mutations reveal preferential activation of gonadotropin-releasing hormone release by neurokinin B in neonatal life followed by reversal in adulthood. , 2010, The Journal of clinical endocrinology and metabolism.

[33]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[34]  S. Tobet,et al.  Hypothalamic but not pituitary or ovarian defects underlie the reproductive abnormalities in Axl/Tyro3 null mice , 2011, Molecular and Cellular Endocrinology.

[35]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[36]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[37]  C. Caligioni,et al.  Uncovering novel reproductive defects in neurokinin B receptor null mice: closing the gap between mice and men. , 2012, Endocrinology.

[38]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[39]  W. Crowley,et al.  Discovering Genes Essential to the Hypothalamic Regulation of Human Reproduction Using a Human Disease Model: Adjusting to Life in the "-Omics" Era. , 2016, Endocrine reviews.

[40]  S. Tobet,et al.  Axl and Tyro3 modulate female reproduction by influencing gonadotropin-releasing hormone neuron survival and migration. , 2008, Molecular endocrinology.

[41]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[42]  D. Goldstein,et al.  Whole‐exome sequencing in 20,197 persons for rare variants in Alzheimer's disease , 2018, Annals of clinical and translational neurology.

[43]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[44]  J. Gusella,et al.  Oligogenic basis of isolated gonadotropin-releasing hormone deficiency , 2010, Proceedings of the National Academy of Sciences.