Proteome-wide Structural Analysis of PTM Hotspots Reveals Regulatory Elements Predicted to Impact Biological Function and Disease

Post-translational modifications (PTMs) regulate protein behavior through modulation of protein-protein interactions, enzymatic activity, and protein stability essential in the translation of genotype to phenotype in eukaryotes. Currently, less than 4% of all eukaryotic PTMs are reported to have biological function - a statistic that continues to decrease with an increasing rate of PTM detection. Previously, we developed SAPH-ire (Structural Analysis of PTM Hotspots) - a method for the prioritization of PTM function potential that has been used effectively to reveal novel PTM regulatory elements in discrete protein families (Dewhurst et al., 2015). Here, we apply SAPH-ire to the set of eukaryotic protein families containing experimental PTM and 3D structure data - capturing 1,325 protein families with 50,839 unique PTM sites organized into 31,747 modified alignment positions (MAPs), of which 2010 (∼6%) possess known biological function. Here, we show that using an artificial neural network model (SAPH-ire NN) trained to identify MAP hotspots with biological function results in prediction outcomes that far surpass the use of single hotspot features, including nearest neighbor PTM clustering methods. We find the greatest enhancement in prediction for positions with PTM counts of five or less, which represent 98% of all MAPs in the eukaryotic proteome and 90% of all MAPs found to have biological function. Analysis of the top 1092 MAP hotspots revealed 267 of truly unknown function (containing 5443 distinct PTMs). Of these, 165 hotspots could be mapped to human KEGG pathways for normal and/or disease physiology. Many high-ranking hotspots were also found to be disease-associated pathogenic sites of amino acid substitution despite the lack of observable PTM in the human protein family member. Taken together, these experiments demonstrate that the functional relevance of a PTM can be predicted very effectively by neural network models, revealing a large but testable body of potential regulatory elements that impact hundreds of different biological processes important in eukaryotic biology and human health.

[1]  C. Long,et al.  Prevalence of Desmin Mutations in Dilated Cardiomyopathy , 2007, Circulation.

[2]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[3]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[4]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[5]  S. Legartová,et al.  Post‐Translational Modifications of Histones in Human Sperm , 2015, Journal of cellular biochemistry.

[6]  Alan M. Moses,et al.  Evolution of characterized phosphorylation sites in budding yeast. , 2010, Molecular biology and evolution.

[7]  Pedro Beltrão,et al.  Prediction of Functionally Important Phospho-Regulatory Events in Xenopus laevis Oocytes , 2015, PLoS Comput. Biol..

[8]  Qingyu Xiao,et al.  Prioritizing functional phosphorylation sites based on multiple feature integration , 2016, Scientific Reports.

[9]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[10]  H. Kurumizaka,et al.  Charge‐neutralization effect of the tail regions on the histone H2A/H2B dimer structure , 2015, Protein science : a publication of the Protein Society.

[11]  J. Workman,et al.  Introducing the acetylome , 2009, Nature Biotechnology.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  T. Kouzarides Chromatin Modifications and Their Function , 2007, Cell.

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  Margaret Jacobs,et al.  Identification of the Phospholipid Binding Site in the Vitamin K-dependent Blood Coagulation Protein Factor IX* , 1996, The Journal of Biological Chemistry.

[16]  G. Lienhard,et al.  Non-functional phosphorylations? , 2008, Trends in biochemical sciences.

[17]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[18]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[19]  Tony Pawson,et al.  Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication , 2001, Nature.

[20]  R. Arkowitz,et al.  Chemical gradients and chemotropism in yeast. , 2009, Cold Spring Harbor perspectives in biology.

[21]  W. Lim,et al.  Systematic Functional Prioritization of Protein Posttranslational Modifications , 2012, Cell.

[22]  P. Price,et al.  Molecular cloning of matrix Gla protein: implications for substrate recognition by the vitamin K-dependent gamma-carboxylase. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[23]  E. P. Kennedy,et al.  The enzymatic phosphorylation of proteins. , 1954, The Journal of biological chemistry.

[24]  Peer Bork,et al.  Deciphering a global network of functionally associated post-translational modifications , 2012, Molecular systems biology.

[25]  Dirk Walther,et al.  Characterization and Prediction of Protein Phosphorylation Hotspots in Arabidopsis thaliana , 2012, Front. Plant Sci..

[26]  Peer Bork,et al.  PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins , 2012, Nucleic Acids Res..

[27]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[28]  D. Meek Regulation of the p53 response and its relationship to cancer. , 2015, The Biochemical journal.

[29]  L. Waddell,et al.  Importance and challenge of making an early diagnosis in LMNA-related muscular dystrophy , 2012, Neurology.

[30]  P. Bandyopadhyay Vitamin K-dependent gamma-glutamylcarboxylation: an ancient posttranslational modification. , 2008, Vitamins and hormones.

[31]  Leo Breiman,et al.  [Neural Networks: A Review from Statistical Perspective]: Comment , 1994 .

[32]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[33]  Matthew P. Torres,et al.  Structural Analysis of PTM Hotspots (SAPH-ire) – A Quantitative Informatics Method Enabling the Discovery of Novel Regulatory Elements in Protein Families , 2015, Molecular & Cellular Proteomics.

[34]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[35]  Andrew R. Barron [Neural Networks: A Review from Statistical Perspective]: Comment , 1994 .

[36]  J. Crosson,et al.  Arrhythmogenic right ventricular dysplasia/cardiomyopathy , 2017, Cardiology in the Young.

[37]  T. Umehara,et al.  Intra- and inter-nucleosomal interactions of the histone H4 tail revealed with a human nucleosome core particle with genetically-incorporated H4 tetra-acetylation , 2015, Scientific Reports.

[38]  D. Fatkin,et al.  Identification and Functional Characterization of Cardiac Troponin I As a Novel Disease Gene in Autosomal Dominant Dilated Cardiomyopathy , 2009, Circulation research.

[39]  J. Saffitz,et al.  A novel dominant mutation in plakoglobin causes arrhythmogenic right ventricular cardiomyopathy. , 2007, American journal of human genetics.

[40]  J. Mogensen,et al.  Novel mutation in cardiac troponin I in recessive idiopathic dilated cardiomyopathy , 2004, The Lancet.

[41]  D. Tregouet,et al.  High prevalence of laminopathies among patients with metabolic syndrome. , 2011, Human molecular genetics.

[42]  Yolanda T. Chong,et al.  Exploring the Yeast Acetylome Using Functional Genomics , 2012, Cell.

[43]  H. Calkins,et al.  Arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C): a multidisciplinary study: design and protocol. , 2003, Circulation.

[44]  A. Marchese,et al.  G protein-coupled receptor sorting to endosomes and lysosomes. , 2008, Annual review of pharmacology and toxicology.

[45]  P. Devreotes,et al.  G-protein signaling in chemotaxis , 2010 .

[46]  Franca Fraternali,et al.  POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level , 2003, Nucleic Acids Res..

[47]  Philippe P Roux,et al.  Activation and Function of the MAPKs and Their Substrates, the MAPK-Activated Protein Kinases , 2011, Microbiology and Molecular Reviews.

[48]  C. Landry,et al.  Weak functional constraints on phosphoproteomes. , 2009, Trends in genetics : TIG.