Identification of Sequence Variants within Experimentally Validated Protein Interaction Sites Provides New Insights into Molecular Mechanisms of Disease Development

Protein interactions (PI) underlie complex biological processes. Protein interaction partners include DNA, RNA, ions, small chemical compounds, and proteins (protein‐protein interactions; PPI). Analysis of sequence variants within regions corresponding to experimentally validated PI sites presents novel opportunities for understanding of complex diseases. Such information has not been systematically collected due to the fact that datasets are dispersed throughout databases and publications. Sequence variants and PI regions were obtained from the UniProt database. The location of the variants was compared to start and end positions of each PPI. Associations of sequence variants with phenotype were obtained from databases including COSMIC, GAD, PharmGKB, and dbSNP. We developed a catalogue of 603 sequence variants located within regions corresponding to experimentally validated PI sites, mostly PPI regions. These sequence variants were previously associated with risk for cancer, reproduction, ageing, renal, and immune system diseases. The developed catalogue connects information from different research papers and databases, represents a new layer of information and enables designing new hypotheses. It provides a baseline for prioritization of sequence variants, which may affect protein function and binding sites. The study contributes to the development of the proteogenomics field and provides new insights for understanding molecular mechanisms underlying disease development.

[1]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[2]  A. Nesvizhskii Proteogenomics: concepts, applications and computational strategies , 2014, Nature Methods.

[3]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[4]  A. Heath,et al.  Effect of SNPs on iron metabolism , 2007, Genes & Nutrition.

[5]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[6]  Zhiping Weng,et al.  miR-10b-5p expression in Huntington’s disease brain relates to age of onset and the extent of striatal involvement , 2015, BMC Medical Genomics.

[7]  Russ B. Altman,et al.  PharmGKB: Understanding the Effects of Individual Genetic Variants , 2008, Drug metabolism reviews.

[8]  M. Vaccaro,et al.  Macroautophagy and the Oncogene-Induced Senescence , 2014, Front. Endocrinol..

[9]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[10]  M. A. Kelly,et al.  Association of variants within APOE, SORL1, RUNX1, BACE1 and ALDH18A1 with dementia in Alzheimer's disease in subjects with Down syndrome , 2011, Neuroscience Letters.

[11]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[12]  Timoteo Carletti,et al.  The Stochastic Evolution of a Protocell: The Gillespie Algorithm in a Dynamically Varying Volume , 2011, Comput. Math. Methods Medicine.

[13]  Suzanna E. Lewis,et al.  Sequence Ontology Annotation Guide , 2004, Comparative and functional genomics.

[14]  F. Puppo,et al.  Protein tyrosine kinase 7 has a conserved role in Wnt/β‐catenin canonical signalling , 2011, EMBO reports.

[15]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[16]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[17]  L. Kiemeney,et al.  Corrigendum: Genetic variation in the prostate stem cell antigen gene PSCA confers susceptibility to urinary bladder cancer , 2009, Nature Genetics.

[18]  B. Shastry SNPs: impact on gene function and phenotype. , 2009, Methods in molecular biology.

[19]  Chi-Ren Shyu,et al.  Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning , 2014, PLoS Comput. Biol..

[20]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[21]  Blaz Skrlj,et al.  Computational identification of non-synonymous polymorphisms within regions corresponding to protein interaction sites , 2016, Comput. Biol. Medicine.

[22]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[23]  A. Farhadi,et al.  Investigation of primary microcephaly in Bushehr province of Iran: novel STIL and ASPM mutations , 2013, Clinical genetics.

[24]  M. Sternberg,et al.  Protein–protein interaction sites are hot spots for disease‐associated nonsynonymous SNPs , 2012, Human mutation.

[25]  Brad T. Sherman,et al.  DAVID-WS: a stateful web service to facilitate gene/protein list analysis , 2012, Bioinform..

[26]  H. Hamada,et al.  RBM14 prevents assembly of centriolar protein complexes and maintains mitotic spindle integrity , 2014, The EMBO journal.

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[28]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[29]  Toshiro K. Ohsumi,et al.  Inherited DOCK2 Deficiency in Patients with Early-Onset Invasive Infections. , 2015, The New England journal of medicine.

[30]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[31]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[32]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[33]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[34]  T. Nakajima,et al.  A common Ile796Val polymorphism of the human SREBP cleavage-activating protein (SCAP) gene , 1999, Journal of Human Genetics.

[35]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[36]  C. Férec,et al.  Phenotypic expression of the C282Y/Q283P compound heterozygosity in HFE and molecular modeling of the Q283P mutation effect. , 2003, Blood cells, molecules & diseases.

[37]  E. Lander,et al.  Comprehensive assessment of cancer missense mutation clustering in protein structures , 2015, Proceedings of the National Academy of Sciences.

[38]  K. Nagashima,et al.  DOCK2 associates with CrkL and regulates Rac1 in human leukemia cell lines. , 2002, Blood.

[39]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[40]  Robert C. Elston,et al.  Defining “mutation” and “polymorphism” in the era of personal genomics , 2015, BMC Medical Genomics.

[41]  A. Cohen,et al.  Nano-mechanical measurements of protein-DNA interactions with a silicon nitride pulley , 2015, Nucleic acids research.