Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

The availability of disease‐specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI‐5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV‐disease relationships.

[1]  Andrew J. Hill,et al.  Analysis of protein-coding genetic variation in 60,706 humans , 2015, bioRxiv.

[2]  Yue Cao,et al.  Predicting pathogenicity of missense variants with weakly supervised regression , 2019, Human mutation.

[3]  Debnath Pal,et al.  De novo inference of protein function from coarse‐grained dynamics , 2014, Proteins.

[4]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[5]  Predrag Radivojac,et al.  Missense variant pathogenicity predictors generalize well across a range of function‐specific prediction challenges , 2017, Human mutation.

[6]  J. Hopper,et al.  Rare, evolutionarily unlikely missense substitutions in CHEK2 contribute to breast cancer susceptibility: results from a breast cancer family registry case-control mutation-screening study , 2011, Breast Cancer Research.

[7]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[8]  Thomas A. Peterson,et al.  Towards precision medicine: advances in computational approaches for the analysis of human variants. , 2013, Journal of molecular biology.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Emidio Capriotti,et al.  Bioinformatics for personal genome interpretation , 2012, Briefings Bioinform..

[11]  Gustavo Glusman,et al.  A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data , 2014, Nature Biotechnology.

[12]  S. Tavtigian,et al.  In silico analysis of missense substitutions using sequence‐alignment based methods , 2008, Human mutation.

[13]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[14]  B. Rost,et al.  funtrp: identifying protein positions for variation driven functional tuning , 2019, bioRxiv.

[15]  R. Altman,et al.  WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation , 2013, BMC Genomics.

[16]  Mark Yandell,et al.  VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix , 2013, Genetic epidemiology.

[17]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[18]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[19]  Lilia M. Iakoucheva,et al.  MutPred2: inferring the molecular and phenotypic impact of amino acid variants , 2017, bioRxiv.

[20]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[21]  E. John,et al.  Multigene testing of moderate-risk genes: be mindful of the missense , 2016, Journal of Medical Genetics.

[22]  O. Lichtarge,et al.  A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness , 2014, Genome research.

[23]  Piero Fariselli,et al.  PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants , 2017, Nucleic Acids Res..

[24]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[25]  S. Vadaparampil,et al.  Genomic Disparities in Breast Cancer Among Latinas. , 2016, Cancer control : journal of the Moffitt Cancer Center.

[26]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[27]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[28]  M. G. Reese,et al.  A probabilistic disease-gene finder for personal genomes. , 2011, Genome research.

[29]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[30]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[31]  Shunsuke Kato,et al.  Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods , 2006, Nucleic acids research.

[32]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[33]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[34]  R. Nussbaum,et al.  Modeling the ACMG/AMP Variant Classification Guidelines as a Bayesian Classification Framework , 2018, Genetics in Medicine.

[35]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[36]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[37]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[38]  S. Carr,et al.  Mammalian Chk2 is a downstream effector of the ATM-dependent DNA damage checkpoint pathway , 1999, Oncogene.

[39]  J. Weitzel,et al.  Extending comprehensive cancer center expertise in clinical cancer genetics and genomics to diverse communities: the power of partnership. , 2010, Journal of the National Comprehensive Cancer Network : JNCCN.

[40]  Y. Bignon,et al.  CHEK2 contribution to hereditary breast cancer in non-BRCA families , 2011, Breast Cancer Research.

[41]  A. Jakubowska,et al.  Risk of breast cancer in women with a CHEK2 mutation with and without a family history of breast cancer. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[42]  Piero Fariselli,et al.  Correlating disease‐related mutations to their effect on protein stability: A large‐scale analysis of the human proteome , 2011, Human mutation.

[43]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[44]  Olivier Lichtarge,et al.  Evolutionary Action Score of TP53 Identifies High-Risk Mutations Associated with Decreased Survival and Increased Distant Metastases in Head and Neck Cancer. , 2015, Cancer research.

[45]  I. Papasotiriou,et al.  Current perspectives on CHEK2 mutations in breast cancer , 2017, Breast cancer.

[46]  Piero Fariselli,et al.  Blind prediction of deleterious amino acid variations with SNPs&GO , 2017, Human mutation.

[47]  A. Zharkikh,et al.  Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral , 2005, Journal of Medical Genetics.

[48]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[49]  Kuang Lin,et al.  A simple and fast secondary structure prediction method using hidden neural networks , 2005, Bioinform..