Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity.

Predicting the impact of genetic variation on human health remains an important and difficult challenge. Often, algorithmic classifiers are tasked with predicting binary traits (e.g. positive or negative for a disease) from missense variation. Though useful, this arrangement is limiting and contrived, because human diseases often comprise a spectrum of severities, rather than a discrete partitioning of patient populations. Furthermore, labeling variants as causal or benign can be error prone, which is problematic for training supervised learning algorithms (the so-called garbage in, garbage out phenomenon). We explore the potential value of training classifiers using continuous-valued quantitative measurements, rather than binary traits. Using 20 variants from cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domains and six quantitative measures of cystic fibrosis (CF) severity, we trained classifiers to predict CF severity from CFTR variants. Employing cross validation, classifier prediction and measured clinical/functional values were significantly correlated for four of six quantitative traits (correlation P-values from 1.35 × 10(-4) to 4.15 × 10(-3)). Classifiers were also able to stratify variants by three clinically relevant risk categories with 85-100% accuracy, depending on which of the six quantitative traits was used for training. Finally, we characterized 11 additional CFTR variants using clinical sweat chloride testing, two functional assays, or all three diagnostics, and validated our classifier using blind prediction. Predictions were within the measured sweat chloride range for seven of eight variants, and captured the differential impact of specific variants on the two functional assays. This work demonstrates a promising and novel framework for assessing the impact of genetic variation.

[1]  J. Riordan,et al.  Purification and Crystallization of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR)* , 2004, Journal of Biological Chemistry.

[2]  C. Pepine,et al.  Coronary angiography: is it time to reassess? , 2013, Circulation.

[3]  J. Riordan,et al.  CFTR function and prospects for therapy. , 2008, Annual review of biochemistry.

[4]  F. Collins,et al.  The path to personalized medicine. , 2010, The New England journal of medicine.

[5]  J. Hull,et al.  Contribution of genetic factors other than CFTR to disease severity in cystic fibrosis , 1998, Thorax.

[6]  G. Thaker,et al.  Psychosis endophenotypes in schizophrenia and bipolar disorder. , 2007, Schizophrenia bulletin.

[7]  Paola Vergani,et al.  CFTR channel opening by ATP-driven tight dimerization of its nucleotide-binding domains , 2005, Nature.

[8]  Ben Lehner Genotype to phenotype: lessons from model organisms for human genetics , 2013, Nature Reviews Genetics.

[9]  S. Tavtigian,et al.  In silico analysis of missense substitutions using sequence‐alignment based methods , 2008, Human mutation.

[10]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[11]  George P Patrinos,et al.  Defining the disease liability of variants in the cystic fibrosis transmembrane conductance regulator gene , 2013, Nature Genetics.

[12]  Lisa M Yerian,et al.  In vivo assessment of liver cell apoptosis as a novel biomarker of disease severity in nonalcoholic fatty liver disease , 2006, Hepatology.

[13]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[14]  Peter T Nelson,et al.  Clinicopathologic Correlations in a Large Alzheimer Disease Center Autopsy Cohort: Neuritic Plaques and Neurofibrillary Tangles "Do Count" When Staging Disease Severity , 2007, Journal of neuropathology and experimental neurology.

[15]  Marek Kimmel,et al.  Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed , 2011, Human mutation.

[16]  Melissa S. Cline,et al.  Using bioinformatics to predict the functional impact of SNVs , 2011, Bioinform..

[17]  Tyrone D. Cannon,et al.  Endophenotypes in the genetic analyses of mental disorders. , 2006, Annual review of clinical psychology.

[18]  Michael Krawczak,et al.  Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease , 2013, Human Genetics.

[19]  Weimin Sun,et al.  Extensive sequencing of the cystic fibrosis transmembrane regulator gene: Assay validation and unexpected benefits of developing a comprehensive test , 2003, Genetics in Medicine.

[20]  Fernando A Bozza,et al.  Cytokine profiles as markers of disease severity in sepsis: a multiplex analysis , 2007, Critical care.

[21]  Rachel Karchin,et al.  Phenotype‐optimized sequence ensembles substantially improve prediction of disease‐causing mutation in cystic fibrosis , 2012, Human mutation.

[22]  L. Almasy,et al.  Endophenotypes as quantitative risk factors for psychiatric disease: rationale and study design. , 2001, American journal of medical genetics.

[23]  K. Linton,et al.  Structure of ABC transporters. , 2011, Essays in biochemistry.

[24]  Eva Rönmark,et al.  Health-related quality of life is related to COPD disease severity , 2005, Health and quality of life outcomes.

[25]  Julie D Thompson,et al.  Multiple Sequence Alignment Using ClustalW and ClustalX , 2003, Current protocols in bioinformatics.

[26]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[27]  Michael R Knowles,et al.  Genetic modifiers of lung disease in cystic fibrosis. , 2005, The New England journal of medicine.

[28]  M. Corey,et al.  Complex two-gene modulation of lung disease severity in children with cystic fibrosis. , 2008, The Journal of clinical investigation.

[29]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[30]  James T. L. Mah,et al.  In silico SNP analysis and bioinformatics tools: a review of the state of the art to aid drug discovery. , 2011, Drug discovery today.

[31]  M. Schwartz,et al.  Association of mannose-binding lectin gene heterogeneity with severity of lung disease and survival in cystic fibrosis. , 1999, The Journal of clinical investigation.

[32]  T. Scanlin,et al.  Clinical evidence that V456A is a Cystic Fibrosis causing mutation in South Asians. , 2012, Journal of Cystic Fibrosis.

[33]  S. Harvey,et al.  Modeling the Conformational Changes Underlying Channel Opening in CFTR , 2013, PloS one.

[34]  Andrew J. Grimm,et al.  Interpreting missense variants: comparing computational methods in human disease genes CDKN2A, MLH1, MSH2, MECP2, and tyrosinase (TYR) , 2007, Human mutation.

[35]  P. Waters,et al.  Human phenylalanine hydroxylase mutations and hyperphenylalaninemia phenotypes: a metanalysis of genotype-phenotype correlations. , 1997, American journal of human genetics.

[36]  M. Gill,et al.  Functional genomics and schizophrenia: endophenotypes and mutant models. , 2007, The Psychiatric clinics of North America.

[37]  Anavaj Sakuntabhai,et al.  A variant in the CD209 promoter is associated with severity of dengue disease , 2005, Nature Genetics.

[38]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.