COSMIC Cancer Gene Census 3D database: understanding the impacts of mutations on cancer targets

Abstract Mutations in hallmark genes are believed to be the main drivers of cancer progression. These mutations are reported in the Catalogue of Somatic Mutations in Cancer (COSMIC). Structural appreciation of where these mutations appear, in protein–protein interfaces, active sites or deoxyribonucleic acid (DNA) interfaces, and predicting the impacts of these mutations using a variety of computational tools are crucial for successful drug discovery and development. Currently, there are 723 genes presented in the COSMIC Cancer Gene Census. Due to the complexity of the gene products, structures of only 87 genes have been solved experimentally with structural coverage between 90% and 100%. Here, we present a comprehensive, user-friendly, web interface (https://cancer-3d.com/) of 714 modelled cancer-related genes, including homo-oligomers, hetero-oligomers, transmembrane proteins and complexes with DNA, ribonucleic acid, ligands and co-factors. Using SDM and mCSM software, we have predicted the impacts of reported mutations on protein stability, protein–protein interfaces affinity and protein–nucleic acid complexes affinity. Furthermore, we also predicted intrinsically disordered regions using DISOPRED3.

[1]  Michael J E Sternberg,et al.  The Phyre2 web portal for protein modeling, prediction and analysis , 2015, Nature Protocols.

[2]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[3]  P. Campbell,et al.  Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes , 2015, Nature Communications.

[4]  Li Ding,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations (vol 173, 371.e1, 2018) , 2018 .

[5]  Wei-Chung Cheng,et al.  DriverDBv3: a multi-omics database for cancer driver gene research , 2019, Nucleic Acids Res..

[6]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[7]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[8]  Marcin J. Skwark,et al.  Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation , 2019, Nucleic Acids Res..

[9]  Silvio C. E. Tosatto,et al.  MobiDB: intrinsically disordered proteins in 2021 , 2020, Nucleic Acids Res..

[10]  Sony Malhotra,et al.  Understanding the impacts of missense mutations on structures and functions of human cancer-related genes: A preliminary computational analysis of the COSMIC Cancer Gene Census , 2019, PloS one.

[11]  Eugene Krissinel,et al.  Stock-based detection of protein oligomeric states in jsPISA , 2015, Nucleic Acids Res..

[12]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[13]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[14]  Timothy P. Levine,et al.  Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains , 2016, Traffic.

[15]  Douglas E. V. Pires,et al.  mCSM: predicting the effects of mutations in proteins using graph-based signatures , 2013, Bioinform..

[16]  Douglas E. V. Pires,et al.  mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance , 2016, Scientific Reports.

[17]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[18]  Andrei L. Lomize,et al.  OPM: Orientations of Proteins in Membranes database , 2006, Bioinform..

[19]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[20]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[21]  M. Marra,et al.  Driver and passenger mutations in cancer. , 2015, Annual review of pathology.

[22]  Erik N. Bergstrom,et al.  The repertoire of mutational signatures in human cancer , 2018, bioRxiv.

[23]  Lei Zhang,et al.  Differences between germline and somatic mutation rates in humans and mice , 2017, Nature Communications.

[24]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Yang Li,et al.  LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins , 2019, Nucleic Acids Res..

[26]  Rana Rehan Khalid,et al.  The Molecular Organization of Human cGMP Specific Phosphodiesterase 6 (PDE6): Structural Implications of Somatic Mutations in Cancer and Retinitis Pigmentosa , 2019, Computational and structural biotechnology journal.

[27]  Alessandro Testori,et al.  The role of BRAF V600 mutation in melanoma , 2012, Journal of Translational Medicine.

[28]  Ali F. Alsulami,et al.  SARS-CoV-2 3D database: understanding the coronavirus proteome and evaluating possible drug targets , 2021, Briefings Bioinform..

[29]  E. V. Van Allen,et al.  Next-generation sequencing to guide cancer therapy , 2015, Genome Medicine.

[30]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[31]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[32]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[33]  David Baker,et al.  Foldit Standalone: a video game-derived protein structure manipulation interface using Rosetta , 2017, Bioinform..

[34]  Tom L. Blundell,et al.  SDM: a server for predicting effects of mutations on protein stability , 2017, Nucleic Acids Res..

[35]  Byung-Jun Yoon,et al.  Hidden Markov Models and their Applications in Biological Sequence Analysis , 2009, Current genomics.

[36]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[37]  Michael P. Schroeder,et al.  IntOGen-mutations identifies cancer drivers across tumor types , 2013, Nature Methods.

[38]  Marcel L Verdonk,et al.  COSMIC-3D provides structural perspectives on cancer genetics for drug discovery , 2018, Nature Genetics.

[39]  Minghui Li,et al.  Finding driver mutations in cancer: Elucidating the role of background mutational processes , 2018, bioRxiv.

[40]  Ying Li,et al.  Control of Cell Proliferation and Apoptosis by Mob as Tumor Suppressor, Mats , 2005, Cell.

[41]  N. Knowlton,et al.  GenBank is a reliable resource for 21st century biodiversity research , 2019, Proceedings of the National Academy of Sciences.

[42]  The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes , 2020 .

[43]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[44]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[45]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[46]  Silvio C. E. Tosatto,et al.  The InterPro protein families and domains database: 20 years on , 2020, Nucleic Acids Res..

[47]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[48]  Shicai Wang,et al.  COSMIC: the Catalogue Of Somatic Mutations In Cancer , 2018, Nucleic Acids Res..

[49]  Yuan Yuan,et al.  Single nucleotide polymorphisms and cancer susceptibility , 2017, Oncotarget.

[50]  Douglas E. V. Pires,et al.  Tumour risks and genotype–phenotype correlations associated with germline variants in succinate dehydrogenase subunit genes SDHB, SDHC and SDHD , 2018, Journal of Medical Genetics.