A Bayesian semi-parametric model for thermal proteome profiling

The thermal stability of proteins can be altered when they interact with small molecules, other biomolecules or are subject to post-translation modifications. Thus monitoring the thermal stability of proteins under various cellular perturbations can provide insights into protein function, as well as potentially determine drug targets and off-targets. Thermal proteome profiling is a highly multiplexed mass-spectrommetry method for monitoring the melting behaviour of thousands of proteins in a single experiment. In essence, thermal proteome profiling assumes that proteins denature upon heating and hence become insoluble. Thus, by tracking the relative solubility of proteins at sequentially increasing temperatures, one can report on the thermal stability of a protein. Standard thermodynamics predicts a sigmoidal relationship between temperature and relative solubility and this is the basis of current robust statistical procedures. However, current methods do not model deviations from this behaviour and they do not quantify uncertainty in the melting profiles. To overcome these challenges, we propose the application of Bayesian functional data analysis tools which allow complex temperature-solubility behaviours. Our methods have improved sensitivity over the state-of-the art, identify new drug-protein associations and have less restrictive assumptions than current approaches. Our methods allows for comprehensive analysis of proteins that deviate from the predicted sigmoid behaviour and we uncover potentially biphasic phenomena with a series of published datasets.

[1]  P. Nordlund,et al.  Identifying purine nucleoside phosphorylase as the target of quinine using cellular thermal shift assay , 2019, Science Translational Medicine.

[2]  Jun X. Huang,et al.  High Throughput Discovery of Functional Protein Modifications by Hotspot Thermal Profiling , 2019, Nature Methods.

[3]  Lukasz A. Kurgan,et al.  D2P2: database of disordered protein predictions , 2012, Nucleic Acids Res..

[4]  Joe D. Lewis,et al.  A cap-binding protein complex mediating U snRNA export , 1995, Nature.

[5]  Haavard Rue,et al.  Constructing Priors that Penalize the Complexity of Gaussian Random Fields , 2015, Journal of the American Statistical Association.

[6]  S. Schreiber,et al.  Three proteins define a class of human histone deacetylases related to yeast Hda1p. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  E. Seto,et al.  HDACs and HDAC Inhibitors in Cancer Development and Therapy. , 2016, Cold Spring Harbor perspectives in medicine.

[8]  Ian R. Smith,et al.  Identification of phosphosites that alter protein thermal stability , 2020, Nature Methods.

[9]  H. Hughes,et al.  Phosphorylation and membrane dissociation of the ARF exchange factor GBF1 in mitosis. , 2010, The Biochemical journal.

[10]  T. Parmely,et al.  Identification of New Subunits of the Multiprotein Mammalian TRRAP/TIP60-containing Histone Acetyltransferase Complex* , 2003, Journal of Biological Chemistry.

[11]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[12]  Paola Picotti,et al.  Measuring protein structural changes on a proteome-wide scale using limited proteolysis-coupled mass spectrometry , 2017, Nature Protocols.

[13]  M. Robinson,et al.  Characterization of a fourth adaptor-related protein complex. , 1999, Molecular biology of the cell.

[14]  Paul C. Lambert Comment on article by Browne and Draper , 2006 .

[15]  Song Tan,et al.  Structural and Functional Conservation of the NuA4 Histone Acetyltransferase Complex from Yeast to Humans , 2004, Molecular and Cellular Biology.

[16]  Richard M. Dudley,et al.  Sample Functions of the Gaussian Process , 1973 .

[17]  Xinchao Yu,et al.  A Structure-Based Mechanism for Arf1-Dependent Recruitment of Coatomer to Membranes , 2012, Cell.

[18]  Xiao-Fan Wang,et al.  HDAC6 is a microtubule-associated deacetylase , 2002, Nature.

[19]  Arno Solin,et al.  Hilbert space methods for reduced-rank Gaussian process regression , 2014, Stat. Comput..

[20]  N. Shimizu,et al.  Cloning of a human homolog of the Drosophila minibrain/rat Dyrk gene from "the Down syndrome critical region" of chromosome 21. , 1996, Biochemical and biophysical research communications.

[21]  Van Der Vaart,et al.  Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[22]  T. Ohshima,et al.  Stimulated emission from nitrogen-vacancy centres in diamond , 2016, Nature Communications.

[23]  F. Boisvert,et al.  The multifunctional nucleolus , 2007, Nature Reviews Molecular Cell Biology.

[24]  C. Caslini,et al.  HDAC7 regulates histone 3 lysine 27 acetylation and transcriptional activity at super-enhancer-associated genes in breast cancer stem cells , 2019, Oncogene.

[25]  Richard H. Jones Analysis of repeated measures , 1992 .

[26]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[27]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[28]  Jonah Z. Vilseck,et al.  Mutant thermal proteome profiling for characterization of missense protein variants and their associated phenotypes within the proteome , 2020, The Journal of Biological Chemistry.

[29]  Thomas A. Hopf,et al.  Meltome atlas—thermal proteome stability across the tree of life , 2020, Nature Methods.

[30]  S. Phinn,et al.  Australian vegetated coastal ecosystems as global hotspots for climate change mitigation , 2019, Nature Communications.

[31]  M. Cole,et al.  An ATPase/helicase complex is an essential cofactor for oncogenic transformation by c-Myc. , 2000, Molecular cell.

[32]  K. Bechtol,et al.  Chunaram Choudhary Major Cellular Functions Lysine Acetylation Targets Protein Complexes and Co-Regulates , 2012 .

[33]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[34]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[35]  K. Jakobs,et al.  Characteristics of protein-kinase-C- and ADP-ribosylation-factor-stimulated phospholipase D activities in human embryonic kidney cells. , 1997, European journal of biochemistry.

[36]  M. Savitski,et al.  Thermal proteome profiling for interrogating protein interactions , 2020, Molecular systems biology.

[37]  W. Huber,et al.  Proteome-wide solubility and thermal stability profiling reveals distinct regulatory roles for ATP , 2018, Nature Communications.

[38]  J. Cox,et al.  The nucleolus functions as a phase-separated protein quality control compartment , 2019, Science.

[39]  Oliver M. Crook,et al.  Determining the content of vesicles captured by golgin tethers using LOPIT-DC , 2019, bioRxiv.

[40]  B. Coulombe,et al.  R2TP/Prefoldin-like component RUVBL1/RUVBL2 directly interacts with ZNHIT2 to regulate assembly of U5 small nuclear ribonucleoprotein , 2017, Nature Communications.

[41]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[42]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[43]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[44]  Alexis Boukouvalas,et al.  BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process , 2018, Genome Biology.

[45]  Jun Qin,et al.  Involvement of the TIP60 Histone Acetylase Complex in DNA Repair and Apoptosis , 2000, Cell.

[46]  E. Seto,et al.  Erasers of histone acetylation: the histone deacetylase enzymes. , 2014, Cold Spring Harbor perspectives in biology.

[47]  Oliver M. Crook,et al.  Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics. , 2019, The annals of applied statistics.

[48]  Lorenz Wernisch,et al.  Pseudotime estimation: deconfounding single cell time series , 2015, bioRxiv.

[49]  Rob Johnson,et al.  SYSBIONS: nested sampling for systems biology , 2015, Bioinform..

[50]  Jeremy R. Jacobsen,et al.  An isothermal shift assay for proteome scale drug-target identification , 2020, Communications Biology.

[51]  J. Bonifacino,et al.  AP-4, a Novel Protein Complex Related to Clathrin Adaptors* , 1999, The Journal of Biological Chemistry.

[52]  S. Lonial,et al.  Panobinostat for the treatment of multiple myeloma , 2012, Expert opinion on investigational drugs.

[53]  Jonah Gabry,et al.  R-squared for Bayesian Regression Models , 2019, The American Statistician.

[54]  Paul D. W. Kirk,et al.  Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements , 2011, BMC Bioinformatics.

[55]  Chern Han Yong,et al.  Thermal proximity coaggregation for system-wide profiling of protein complex dynamics in cells , 2018, Science.

[56]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[57]  Rozbeh Jafari,et al.  Cellular Thermal Shift Assay Monitoring Drug Target Engagement in Cells and Tissues Using the , 2014 .

[58]  T. Mak,et al.  ANP32E is a histone chaperone that removes H2A.Z from chromatin , 2014, Nature.

[59]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[60]  G. Drewes,et al.  Thermal proteome profiling monitors ligand interactions with cellular membrane proteins , 2015, Nature Methods.

[61]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[62]  Z. Darżynkiewicz,et al.  Different effects of staurosporine, an inhibitor of protein kinases, on the cell cycle and chromatin structure of normal and leukemic lymphocytes. , 1992, Cancer research.

[63]  Hans-Georg Müller Functional Data Analysis. , 2011 .

[64]  Rui Paulo Default priors for Gaussian processes , 2005 .

[65]  G. Superti-Furga,et al.  Proteome-wide drug and metabolite interaction mapping by thermal-stability profiling , 2015, Nature Methods.

[66]  James L. Powell,et al.  Estimation of semiparametric models , 1994 .

[67]  P. Nordlund,et al.  The cellular thermal shift assay for evaluating drug target interactions in cells , 2014, Nature Protocols.

[68]  James O. Berger,et al.  Posterior model probabilities via path‐based pairwise priors , 2005 .

[69]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[70]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[71]  Christian von Mering,et al.  Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability , 2017, Science.

[72]  G. Drewes,et al.  Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry , 2015, Nature Protocols.

[73]  Paul D. W. Kirk,et al.  Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data , 2009, Bioinform..

[74]  Robert A Copeland,et al.  A selective inhibitor of PRMT5 with in vivo and in vitro potency in MCL models. , 2015, Nature chemical biology.

[75]  W. Huber,et al.  Nonparametric Analysis of Thermal Proteome Profiles Reveals Novel Drug-binding Proteins , 2019, Molecular & Cellular Proteomics.

[76]  U. Sauer,et al.  A Map of Protein-Metabolite Interactions Reveals Principles of Chemical Communication , 2018, Cell.

[77]  Kathryn S. Lilley,et al.  A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection , 2020, bioRxiv.

[78]  Victor De Oliveira,et al.  Objective Bayesian analysis of spatial data with measurement error , 2007 .

[79]  C. A. Glasbey Nonlinear Regression with Autoregressive Time Series Errors , 1980 .

[80]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[81]  Lifeng Lin,et al.  Performing Arm-Based Network Meta-Analysis in R with the pcnetmeta Package. , 2017, Journal of statistical software.

[82]  M. Savitski,et al.  Thermal proteome profiling: unbiased assessment of protein state through heat-induced stability changes , 2016, Proteome Science.

[83]  J. Schellman,et al.  The thermodynamics of solvent exchange , 1994, Biopolymers.

[84]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[85]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[86]  M. Savitski,et al.  Impact of phosphorylation on thermal stability of proteins , 2020, Nature Methods.

[87]  M. Savitski,et al.  Thermal proteome profiling in bacteria: probing protein state in vivo , 2018, Molecular systems biology.

[88]  Michael P H Stumpf,et al.  Topological sensitivity analysis for systems biology , 2014, Proceedings of the National Academy of Sciences.

[89]  C. Robert,et al.  Computational methods for Bayesian model choice , 2009, 0907.5123.

[90]  Bailey K. Fosdick,et al.  Modern Statistics for Modern Biology , 2020 .

[91]  Oliver M. Crook,et al.  Combining LOPIT with differential ultracentrifugation for high-resolution spatial proteomics , 2019, Nature Communications.

[92]  R. Zubarev,et al.  System-wide Identification of Enzyme Substrates by Thermal Analysis (SIESTA) , 2018, bioRxiv.

[93]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[94]  S. Knapp,et al.  Structures of Down Syndrome Kinases, DYRKs, Reveal Mechanisms of Kinase Activation and Substrate Recognition , 2013, Structure.

[95]  J. Berger,et al.  Objective Bayesian Analysis of Spatially Correlated Data , 2001 .

[96]  Douglas W. Thomson,et al.  Identifying drug targets in tissues and whole blood with thermal-shift profiling , 2020, Nature Biotechnology.

[97]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[98]  Mindy I. Davis,et al.  A quantitative analysis of kinase inhibitor selectivity , 2008, Nature Biotechnology.

[99]  Nobutoshi Ito,et al.  Development of a novel selective inhibitor of the Down syndrome-related kinase Dyrk1A. , 2010, Nature communications.

[100]  M. Bantscheff,et al.  Thermal profiling reveals phenylalanine hydroxylase as an off-target of panobinostat. , 2016, Nature chemical biology.

[101]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[102]  Lorenz Wernisch,et al.  GPseudoClust: deconvolution of shared pseudo-profiles at single-cell resolution , 2019, Bioinform..

[103]  Richard Svensson,et al.  MTH1 inhibition eradicates cancer by preventing sanitation of the dNTP pool , 2014, Nature.

[104]  C. Allis,et al.  The molecular hallmarks of epigenetic control , 2016, Nature Reviews Genetics.

[105]  C. A. Glasbey,et al.  Correlated Residuals in Non‐Linear Regression Applied to Growth Data , 1979 .

[106]  D. Blei Bayesian Nonparametrics I , 2016 .

[107]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[108]  P. Nordlund,et al.  Cellular thermal shift assay for the identification of drug–target interactions in the Plasmodium falciparum proteome , 2020, Nature Protocols.

[109]  Laurent Gatto,et al.  Using hyperLOPIT to perform high-resolution mapping of the spatial proteome , 2017, Nature Protocols.

[110]  G. Drewes,et al.  Tracking cancer drugs in living cells by thermal profiling of the proteome , 2014, Science.

[111]  Hyung-Ryong Kim,et al.  Molecular mechanism of staurosporine-induced apoptosis in osteoblasts. , 2000, Pharmacological research.

[112]  M. Savitski,et al.  Aggregation and disaggregation features of the human proteome , 2020, bioRxiv.

[113]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[114]  G. Müller-Newen,et al.  Splice Variants of the Dual Specificity Tyrosine Phosphorylation-regulated Kinase 4 (DYRK4) Differ in Their Subcellular Localization and Catalytic Activity* , 2010, The Journal of Biological Chemistry.

[115]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[116]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[117]  A. Oliveira,et al.  Global analysis of protein structural changes in complex proteomes , 2014, Nature Biotechnology.

[118]  Xiao-Li Meng,et al.  Warp Bridge Sampling , 2002 .

[119]  A. Raftery,et al.  Estimating Bayes Factors via Posterior Simulation with the Laplace—Metropolis Estimator , 1997 .

[120]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[121]  P. Grandi,et al.  Multiplexed Proteome Dynamics Profiling Reveals Mechanisms Controlling Protein Homeostasis , 2018, Cell.

[122]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[123]  C. Robert,et al.  Properties of nested sampling , 2008, 0801.3887.

[124]  J. Berger,et al.  A Bayesian Approach to Subgroup Identification , 2014, Journal of biopharmaceutical statistics.

[125]  Peer Bork,et al.  Pervasive Protein Thermal Stability Variation during the Cell Cycle , 2018, Cell.

[126]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[127]  Joe D. Lewis,et al.  A nuclear cap binding protein complex involved in pre-mRNA splicing , 1994, Cell.

[128]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[129]  Zoubin Ghahramani,et al.  A Robust Bayesian Two-Sample Test for Detecting Intervals of Differential Gene Expression in Microarray Time Series , 2009, RECOMB.

[130]  M. Bantscheff,et al.  High-resolution enabled TMT 8-plexing. , 2012, Analytical chemistry.

[131]  Robert F. Harvey,et al.  Comprehensive identification of RNA–protein interactions in any organism using orthogonal organic phase separation (OOPS) , 2019, Nature Biotechnology.

[132]  James O. Berger,et al.  Comparison of Bayesian and Frequentist Multiplicity Correction For Testing Mutually Exclusive Hypotheses Under Data Dependence , 2016 .

[133]  Wanzhu Tu,et al.  H amiltonian M onte C arlo , 2020 .