Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype–phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome‐sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype–phenotype relationships.

Alexander A. Morgan | Yanay Ofran | Predrag Radivojac | Abhishek Niroula | Mauno Vihinen | Biao Li | Steven E Brenner | Alexander A Morgan | Rita Casadio | Matthew D. Edwards | Russ B Altman | Pietro Di Lena | Roxana Daneshjou | Mehdi Pirooznia | Andre Franke | Samuele Bovo | Sean D Mooney | Ron Unger | Manuel Giollo | Emanuela Leonardi | Teri E Klein | Yana Bromberg | John Moult | Marco Carraro | David Gifford | Laksshman Sundaram | Giulia Babbi | David T. Jones | Matthew Edwards | Kunal Kundu | Susanna Repo | Silvio C E Tosatto | Roger A Hoskins | Vikas Pejaver | Lipika R Pal | Billy Chang | Lipika R. Pal | Yuxiang Jiang | Kymberleigh A Pagel | David T Jones | J. Potash | D. Gifford | R. Altman | P. Zandi | S. Brenner | T. Klein | P. Radivojac | Susanna Repo | R. Casadio | S. Mooney | R. Unger | J. Moult | M. Pirooznia | R. Daneshjou | R. Hoskins | M. Vihinen | A. Franke | Biao Li | P. Martelli | P. Lena | Y. Ofran | Y. Bromberg | R. McCombie | S. Tosatto | Giulia Babbi | A. Niroula | Samuele Bovo | Kunal Kundu | Yizhou Yin | Sohela Shah | J. R. Azaria | M. Wang | V. Pejaver | K. Pagel | Xiaolin Li | Yanran Wang | B. Petersen | A. Gasparini | E. Leonardi | Laksshman Sundaram | Carlo Ferrari | Yizhou Yin | Yanran Wang | Peter Zandi | Yuxiang Jiang | Marco Carraro | Billy Chang | Sohela Shah | Manuel Giollo | Carlo Ferrari | James B Potash | Richard McCombie | Maggie H Wang | Pier L Martelli | Alessandra Gasparini | Rajendra Rana Bhat | Xiaolin Li | Eran Bachar | Johnathan R Azaria | Britt-Sabina Petersen | Eran Bachar | G. Babbi | Yanay Ofran

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  V. Willour,et al.  Assessment of Whole-Exome Sequence Data in Attempted Suicide within a Bipolar Disorder Cohort , 2017, Molecular Neuropsychiatry.

[3]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[4]  M. McNutt #IAmAResearchParasite , 2016, Science.

[5]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[6]  Euan A Ashley,et al.  The precision medicine initiative: a new national effort. , 2015, JAMA.

[7]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[8]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  C. Richards,et al.  Emergency hospitalizations for adverse drug events in older Americans. , 2011, The New England journal of medicine.

[11]  Zachary A. Szpiech,et al.  Genome-wide association studies in diverse populations , 2010, Nature Reviews Genetics.

[12]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[13]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[14]  Nadezhda T. Doncheva,et al.  Association between variants of PRDM1 and NDP52 and Crohn's disease, based on exome sequencing and functional studies. , 2013, Gastroenterology.

[15]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[16]  E. Lindberg,et al.  Inflammatory bowel disease in a Swedish twin cohort: a long-term follow-up of concordance and clinical characteristics. , 2003, Gastroenterology.

[17]  K. Bauer,et al.  Recent progress in anticoagulant therapy: oral direct inhibitors of thrombin and factor Xa , 2011, Journal of thrombosis and haemostasis : JTH.

[18]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[19]  N. Craddock,et al.  Genetics of bipolar disorder. , 2010, Journal of medical genetics.

[20]  Bethany Percha,et al.  Genetic variant in folate homeostasis is associated with lower warfarin dose in African Americans. , 2014, Blood.

[21]  Fagan Tj Letter: Nomogram for Bayes theorem. , 1975 .

[22]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[23]  C. Klein,et al.  The diagnostic approach to monogenic very early onset inflammatory bowel disease. , 2014, Gastroenterology.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Alice C McHardy,et al.  Critical Assessment of Metagenome Interpretation Enters the Second Round , 2018, mSystems.

[26]  Marilyn A. Walker,et al.  Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[27]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[28]  Russ B. Altman,et al.  Bioinformatics challenges for personalized medicine , 2011, Bioinform..

[29]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[30]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[31]  Judy H. Cho,et al.  The genetics and immunopathogenesis of inflammatory bowel disease , 2008, Nature Reviews Immunology.

[32]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[33]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[34]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[35]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[36]  Daniel Rios,et al.  Bioinformatics Applications Note Databases and Ontologies Deriving the Consequences of Genomic Variants with the Ensembl Api and Snp Effect Predictor , 2022 .

[37]  Tony L. Brown,et al.  Exome sequencing a review of new strategies for rare genomic disease research. , 2016, Genomics.

[38]  Michelle Schwalbe,et al.  Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results: Summary of a Workshop , 2016 .

[39]  D. Donoho 50 Years of Data Science , 2017 .

[40]  M. Vihinen,et al.  PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants , 2015, PloS one.