Predicting disease-causing variant combinations

Significance Directly assessing the pathogenicity of variant combinations in multiple genes was until now difficult. Nonetheless, this type of assessment can provide important benefits in identifying the genetic causes of rare diseases. The work presented in this paper aims to resolve this problem by presenting a machine-learning method able to predict the pathogenicity of variant combinations in gene pairs, based on pathogenic data. We demonstrate the high accuracy of this method and its effective capacity to identify novel instances. The method’s decision-making process is also made explicit, a contribution that is useful for clinical interpretation. This pioneering work will lead to toolboxes for geneticists and clinicians that can aid them in counselling their patients more effectively. Notwithstanding important advances in the context of single-variant pathogenicity identification, novel breakthroughs in discerning the origins of many rare diseases require methods able to identify more complex genetic models. We present here the Variant Combinations Pathogenicity Predictor (VarCoPP), a machine-learning approach that identifies pathogenic variant combinations in gene pairs (called digenic or bilocus variant combinations). We show that the results produced by this method are highly accurate and precise, an efficacy that is endorsed when validating the method on recently published independent disease-causing data. Confidence labels of 95% and 99% are identified, representing the probability of a bilocus combination being a true pathogenic result, providing geneticists with rational markers to evaluate the most relevant pathogenic combinations and limit the search space and time. Finally, the VarCoPP has been designed to act as an interpretable method that can provide explanations on why a bilocus combination is predicted as pathogenic and which biological information is important for that prediction. This work provides an important step toward the genetic understanding of rare diseases, paving the way to clinical knowledge and improved patient care.

[1]  Tom Lenaerts,et al.  Understanding mutational effects in digenic diseases , 2017, Nucleic acids research.

[2]  J. Lupski,et al.  Dual molecular diagnosis contributes to atypical Prader–Willi phenotype in monozygotic twins , 2017, American journal of medical genetics. Part A.

[3]  Donna M. Muzny,et al.  Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation , 2017, The New England journal of medicine.

[4]  N. Katsanis The continuum of causality in human genetic disorders , 2016, Genome Biology.

[5]  Tom Lenaerts,et al.  Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects , 2016, Bioinform..

[6]  W. Carré,et al.  Complex mode of inheritance in holoprosencephaly revealed by whole exome sequencing , 2016, Clinical genetics.

[7]  A. Fischer,et al.  Polygenic mutations in the cytotoxicity pathway increase susceptibility to develop HLH immunopathology in mice. , 2016, Blood.

[8]  Brian T. Naughton,et al.  Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases , 2016, Nature Biotechnology.

[9]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[10]  E. Remmers,et al.  Additive loss-of-function proteasome subunit mutations in CANDLE/PRAAS patients promote type I IFN production. , 2015, The Journal of clinical investigation.

[11]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[12]  L. Vissers,et al.  Genetic studies in intellectual disability and related disorders , 2015, Nature Reviews Genetics.

[13]  Tom Lenaerts,et al.  NAR Breakthrough Article: DIDA: A curated and annotated digenic diseases database , 2016, Nucleic Acids Res..

[14]  Jean-Michel Claverie,et al.  The human gene damage index as a gene-level approach to prioritizing exome variants , 2015, Proceedings of the National Academy of Sciences.

[15]  Chad A Shaw,et al.  Molecular Diagnostic Experience of Whole-Exome Sequencing in Adult Patients , 2015, Genetics in Medicine.

[16]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[17]  Karynne E. Patterson,et al.  The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities. , 2015, American journal of human genetics.

[18]  M. Soriano-Ursúa,et al.  Triallelic digenic mutation in the prokineticin 2 and GNRH receptor genes in two brothers with normosmic congenital hypogonadotropic hypogonadism , 2015, Endocrine research.

[19]  Yuming Zhou,et al.  A novel ensemble method for classifying imbalanced data , 2015, Pattern Recognit..

[20]  N. Thomas,et al.  Maturity onset diabetes of the young in India – a distinctive mutation pattern identified through targeted next‐generation sequencing , 2015, Clinical endocrinology.

[21]  K. Dahan,et al.  Evidence of digenic inheritance in Alport syndrome , 2015, Journal of Medical Genetics.

[22]  Magalie S Leduc,et al.  Molecular findings among patients referred for clinical whole-exome sequencing. , 2014, JAMA.

[23]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[24]  Ammar Husami,et al.  Synergistic defects of different molecules in the cytotoxic pathway lead to clinical familial hemophagocytic lymphohistiocytosis. , 2014, Blood.

[25]  Lluis Quintana-Murci,et al.  HGCS : an online tool for prioritizing disease-causing gene variants by biological distance Itan , 2017 .

[26]  M. Bellgrove,et al.  Neurodevelopmental and neuropsychiatric disorders represent an interconnected molecular system , 2014, Molecular Psychiatry.

[27]  A. Tommasini,et al.  Novel Missense Mutation in the NOD2 Gene in a Patient with Early Onset Ulcerative Colitis: Causal or Chance Association? , 2014, International journal of molecular sciences.

[28]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[29]  I. Ouertani,et al.  Update on the Genetics of Bardet-Biedl Syndrome , 2013, Molecular Syndromology.

[30]  Zhenting Zhang,et al.  Involvement of and Interaction between WNT10A and EDA Mutations in Tooth Agenesis Cases in the Chinese Population , 2013, PloS one.

[31]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[32]  E. Boerwinkle,et al.  dbNSFP v2.0: A Database of Human Non‐synonymous SNVs and Their Functional Predictions and Annotations , 2013, Human mutation.

[33]  D. Bleich,et al.  MicroRNA-24/MODY Gene Regulatory Pathway Mediates Pancreatic β-Cell Dysfunction , 2013, Diabetes.

[34]  Alejandro A Schäffer,et al.  Digenic inheritance in medical genetics , 2013, Journal of Medical Genetics.

[35]  M. Rieder,et al.  Erratum: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants (Nature (2013) 493 (216-220) DOI: 10.1038/nature116) , 2013 .

[36]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[37]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[38]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[39]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[41]  J. Shendure,et al.  Exome sequencing as a tool for Mendelian disease gene discovery , 2011, Nature Reviews Genetics.

[42]  J. Lupski,et al.  Clan Genomics and the Complex Architecture of Human Disease , 2011, Cell.

[43]  Syed Haider,et al.  Ensembl BioMarts: a hub for data retrieval across taxonomic space , 2011, Database J. Biol. Databases Curation.

[44]  Z. Gucev,et al.  Cystinuria AA (B): digenic inheritance with three mutations in two cystinuria genes , 2011, Journal of Genetics.

[45]  David R. Murdock,et al.  Whole-Genome Sequencing for Optimized Patient Management , 2011, Science Translational Medicine.

[46]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[47]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  J. Gusella,et al.  Expanding the phenotype and genotype of female GnRH deficiency. , 2011, The Journal of clinical endocrinology and metabolism.

[50]  G. Cutting Modifier genes in Mendelian disorders: the example of cystic fibrosis , 2010, Annals of the New York Academy of Sciences.

[51]  J. Shendure,et al.  Massively parallel sequencing and rare disease. , 2010, Human molecular genetics.

[52]  T. de Ravel,et al.  Genetic Screening of LCA in Belgium: Predominance of CEP290 and Identification of Potential Modifier Alleles in AHI1 of CEP290-related Phenotypes , 2010, Human mutation.

[53]  Insuk Lee,et al.  Characterising and Predicting Haploinsufficiency in the Human Genome , 2010, PLoS genetics.

[54]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[55]  A. Fischer,et al.  Molecular mechanisms of biogenesis and exocytosis of cytotoxic granules , 2010, Nature Reviews Immunology.

[56]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[57]  P. Tonella,et al.  A comparative phenotypic study of kallmann syndrome patients carrying monoallelic and biallelic mutations in the prokineticin 2 or prokineticin receptor 2 genes. , 2010, The Journal of clinical endocrinology and metabolism.

[58]  W. Crowley,et al.  GNRH1 mutations in patients with idiopathic hypogonadotropic hypogonadism , 2009, Proceedings of the National Academy of Sciences.

[59]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[60]  P K Ponnuswamy,et al.  Dynamics of amino acid residues in globular proteins. , 2009, International journal of peptide and protein research.

[61]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[62]  P. Canto,et al.  Genetic analysis in patients with Kallmann syndrome: coexistence of mutations in prokineticin receptor 2 and KAL1. , 2008, Journal of andrology.

[63]  T. Hansen,et al.  Autosomal inheritance of diabetes in two families characterized by obesity and a novel H241Q mutation in NEUROD1 , 2008, Pediatric diabetes.

[64]  R. Mirmira,et al.  Pdx1 and BETA2/NeuroD1 Participate in a Transcriptional Complex That Mediates Short-range DNA Looping at the Insulin Gene* , 2008, Journal of Biological Chemistry.

[65]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[66]  Shaun K Olsen,et al.  Digenic mutations account for variable phenotypes in idiopathic hypogonadotropic hypogonadism. , 2007, The Journal of clinical investigation.

[67]  Mark J. Smyth,et al.  Perforin-mediated target-cell death and immune homeostasis , 2006, Nature Reviews Immunology.

[68]  C. Petit,et al.  Kallmann Syndrome: Mutations in the Genes Encoding Prokineticin-2 and Prokineticin Receptor-2 , 2006, PLoS genetics.

[69]  M. Palacín,et al.  New insights into cystinuria: 40 new mutations, genotype–phenotype correlation, and digenic inheritance causing partial phenotype , 2005, Journal of Medical Genetics.

[70]  P. Yeyati,et al.  Mechanisms of non-Mendelian inheritance in genetic disease. , 2004, Human molecular genetics.

[71]  B. Oostra,et al.  Studying the genetics of Hirschsprung's disease: unraveling an oligogenic disorder , 2004, Clinical genetics.

[72]  N. Katsanis The oligogenic properties of Bardet-Biedl syndrome. , 2004, Human molecular genetics.

[73]  R. Stein,et al.  Insulin promoter factor-1 mutations and diabetes in Trinidad: identification of a novel diabetes-associated mutation (E224K) in an Indo-Trinidadian family. , 2004, The Journal of clinical endocrinology and metabolism.

[74]  N. Katsanis,et al.  Human genetics and disease: Beyond Mendel: an evolving view of human genetic disease transmission , 2002, Nature Reviews Genetics.

[75]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[76]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[77]  J. Battey,et al.  Modifier genes of hereditary hearing loss , 2000, Current Opinion in Neurobiology.

[78]  C R Scriver,et al.  Monogenic traits are not simple: lessons from phenylketonuria. , 1999, Trends in genetics : TIG.

[79]  P. Chanson,et al.  A family with hypogonadotropic hypogonadism and mutations in the gonadotropin-releasing hormone receptor. , 1997, The New England journal of medicine.

[80]  T. Dryja,et al.  Dominant and digenic mutations in the peripherin/RDS and ROM1 genes in retinitis pigmentosa. , 1997, Investigative ophthalmology & visual science.

[81]  Stephen H. White,et al.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces , 1996, Nature Structural Biology.

[82]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[83]  H. Liu,et al.  Genetic variant spectrum in 265 Chinese patients with hemophagocytic lymphohistiocytosis: molecular analyses of PRF1, UNC13D, STX11, STXBP2, SH2D1A, and XIAP. , 2018, Clinical genetics.

[84]  S. Antonarakis,et al.  Vogel and Motulsky's Human Genetics , 2010 .

[85]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[86]  G. Labesse,et al.  PROKR2 missense mutations associated with Kallmann syndrome impair receptor signalling activity. , 2009, Human molecular genetics.

[87]  Huntington F. Willard,et al.  Genetics of Common Disorders with Complex Inheritance , 2007 .

[88]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .