A Probabilistic Model to Predict Clinical Phenotypic Traits from Genome Sequencing

Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  Russ B. Altman,et al.  Interpretome: A Freely Available, Modular, and Secure Personal Genome Interpretation Engine , 2011, Pacific Symposium on Biocomputing.

[3]  R. Vasan,et al.  Using Family-Based Imputation in Genome-Wide Association Studies with Large Complex Pedigrees: The Framingham Heart Study , 2012, PloS one.

[4]  Sue-Jane Wang,et al.  Family history: a comprehensive genetic risk assessment method for the chronic conditions of adulthood. , 1997, American journal of medical genetics.

[5]  K. Dear,et al.  Comparison of family history measures used to identify high risk of coronary heart disease , 1999, Genetic epidemiology.

[6]  C. Shuler,et al.  Inherited risks for susceptibility to dental caries. , 2001, Journal of dental education.

[7]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[8]  Michael Cariaso,et al.  SNPedia: a wiki supporting personal genome annotation, interpretation and analysis , 2011, Nucleic Acids Res..

[9]  S C Hunt,et al.  A comparison of positive family history definitions for defining risk of future disease. , 1986, Journal of chronic diseases.

[10]  Garry R. Cutting,et al.  Clinical practice and genetic counseling for cystic fibrosis and CFTR-related disorders , 2008, Genetics in Medicine.

[11]  Dianna C. Kenneally,et al.  Three etiologic facets of dandruff and seborrheic dermatitis: Malassezia fungi, sebaceous lipids, and individual sensitivity. , 2005, The journal of investigative dermatology. Symposium proceedings.

[12]  J. Sadler,et al.  Appendix II: A revised classification of Von Willebrand disease * , 1997, Haemophilia : the official journal of the World Federation of Hemophilia.

[13]  G PLACITELLI,et al.  [Ulcerative colitis]. , 1958, La Riforma medica.

[14]  Alexander A. Morgan,et al.  Likelihood ratios for genome medicine , 2010, Genome Medicine.

[15]  Ken Chen,et al.  Consortium, G.P A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073 , 2010 .

[16]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[17]  Dimitri M Kullmann,et al.  Genetics of epilepsy , 2002, Journal of neurology, neurosurgery, and psychiatry.

[18]  Jürgen Brockmöller,et al.  On the value of haplotype-based genotype–phenotype analysis and on data transformation in pharmacogenetics and -genomics , 2007, Nature Reviews Genetics.

[19]  M. Goddard,et al.  The Use of Family Relationships and Linkage Disequilibrium to Impute Phase and Missing Genotypes in Up to Whole-Genome Sequence Density Genotypic Data , 2010, Genetics.

[20]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[21]  M. Gonzalez-Garay,et al.  Personalized genomic disease risk of volunteers , 2013, Proceedings of the National Academy of Sciences.

[22]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[23]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[24]  Euan A Ashley,et al.  A public resource facilitating clinical use of genomes , 2012, Proceedings of the National Academy of Sciences.

[25]  C. FordAlexander,et al.  ULCERATIVE colitis. , 1997, Journal of the American Medical Association.

[26]  P. Morrison,et al.  Familial hiatal hernia in a large five generation family confirming true autosomal dominant inheritance , 1999, Gut.

[27]  Henrik Clausen,et al.  Molecular genetic basis of the histo-blood group ABO system , 1990, Nature.

[28]  Daniel F. Gudbjartsson,et al.  Parental origin of sequence variants associated with complex diseases , 2009, Nature.

[29]  U. Suter,et al.  The causes of Charcot-Marie-Tooth disease , 2003, Cellular and Molecular Life Sciences CMLS.

[30]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[31]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[32]  Y. Adachi,et al.  Gilbert's syndrome is caused by a heterozygous missense mutation in the gene for bilirubin UDP-glucuronosyltransferase. , 1995, Human molecular genetics.

[33]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.