New analysis pipeline for high-throughput domain–peptide affinity experiments improves SH2 interaction data

Protein domain interactions with short linear peptides, such as Src homology 2 (SH2) domain interactions with phosphotyrosine-containing peptide motifs (pTyr), are ubiquitous and important to many biochemical processes of the cell. The desire to map and quantify these interactions has resulted in the development of high-throughput (HTP) quantitative measurement techniques, such as microarray or fluorescence polarization assays. For example, in the last 15 years, experiments have progressed from measuring single interactions to covering 500,000 of the 5.5 million possible SH2-pTyr interactions in the human proteome. However, high variability in affinity measurements and disagreements about positive interactions between published datasets led us to re-evaluate the analysis methods and raw data of published SH2-pTyr HTP experiments. We identified several opportunities for improving the identification of positive and negative interactions, and the accuracy of affinity measurements. We implemented model fitting techniques that are more statistically appropriate for the non-linear SH2-pTyr interaction data. We developed a novel method to account for protein concentration errors due to impurities and degradation, as well as addressing protein inactivity and aggregation. Our revised analysis increases reported affinity accuracy, reduces the false negative rate, and results in an increase in useful data due to the addition of reliable true negative results. We demonstrate improvement in classification of binding vs non-binding when using machine learning techniques, suggesting improved coherence in the reanalyzed datasets. We present revised SH2-pTyr affinity results, and propose a new analysis pipeline for future HTP measurements of domain-peptide interactions.

[1]  Gavin MacBeath,et al.  A quantitative study of the recruitment potential of all intracellular tyrosine residues on EGFR, FGFR1 and IGF1R. , 2008, Molecular bioSystems.

[2]  Ruth Nussinov,et al.  Emerging Allosteric Mechanism of EGFR Activation in Physiological and Pathological Contexts , 2019, Biophysical journal.

[3]  Kristen M. Naegle,et al.  ProteomeScout: a repository and analysis resource for post-translational modifications and proteins , 2014, Nucleic Acids Res..

[4]  Joseph J. Falke,et al.  [16] Purification of proteins using polyhistidine affinity tags , 2000 .

[5]  Ulrich Unnerstall,et al.  High Sensitivity Measurement of Transcription Factor-DNA Binding Affinities by Competitive Titration Using Fluorescence Microscopy. , 2019, Journal of visualized experiments : JoVE.

[6]  Y. Yarden,et al.  Untangling the ErbB signalling network , 2001, Nature Reviews Molecular Cell Biology.

[7]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[8]  G. Stormo,et al.  Measuring quantitative effects of methylation on transcription factor–DNA binding affinity , 2017, Science Advances.

[9]  Ronald J. Hause,et al.  Enhanced Prediction of Src Homology 2 (SH2) Domain Binding Potentials Using a Fluorescence Polarization-derived c-Met, c-Kit, ErbB, and Androgen Receptor Interactome* , 2014, Molecular & Cellular Proteomics.

[10]  Livia Perfetto,et al.  The protein interaction network mediated by human SH3 domains. , 2012, Biotechnology advances.

[11]  Gavin MacBeath,et al.  Quantifying protein–protein interactions in high throughput using protein domain microarrays , 2010, Nature Protocols.

[12]  T. Pawson,et al.  SH2 domains recognize specific phosphopeptide sequences , 1993, Cell.

[13]  J. Mackey,et al.  PTEN Loss Is Associated with Worse Outcome in HER2-Amplified Breast Cancer Patients but Is Not Associated with Trastuzumab Resistance , 2015, Clinical Cancer Research.

[14]  Bissan Al-Lazikani,et al.  Large-Scale Profiling of Kinase Dependencies in Cancer Cell Lines , 2016, Cell reports.

[15]  Kazuya Machida,et al.  The SH2 domain: versatile signaling module and pharmaceutical target. , 2005, Biochimica et biophysica acta.

[16]  Qi Zhu,et al.  PepCyber:P∼PEP: a database of human protein–protein interactions mediated by phosphoprotein-binding domains , 2007, Nucleic Acids Res..

[17]  Marc J. Mazerolle,et al.  APPENDIX 1: Making sense out of Akaike's Information Criterion (AIC): its use and interpretation in model selection and inference from ecological data , 2007 .

[18]  Richard Anderson-Sprecher,et al.  Model Comparisons and R 2 , 1994 .

[19]  Andrej-Nikolai Spiess,et al.  An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach , 2010, BMC pharmacology.

[20]  Brett W. Engelmann,et al.  SH2 Domains Recognize Contextual Peptide Sequence Information to Determine Selectivity* , 2010, Molecular & Cellular Proteomics.

[21]  Matthew S. Creamer,et al.  Use of mechanistic models to integrate and analyze multiple proteomic datasets. , 2015, Biophysical journal.

[22]  John B. Willett,et al.  Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis , 1988 .

[23]  A. Gordus,et al.  System-wide investigation of ErbB4 reveals 19 sites of Tyr phosphorylation that are unusually selective in their recruitment properties. , 2008, Chemistry & biology.

[24]  Marc R Birtwistle,et al.  Analytical reduction of combinatorial complexity arising from multiple protein modification sites , 2015, Journal of The Royal Society Interface.

[25]  Leonid A. Mirny,et al.  Using genome-wide measurements for computational prediction of SH2–peptide interactions , 2009, Nucleic acids research.

[26]  Gavin MacBeath,et al.  Linear combinations of docking affinities explain quantitative differences in RTK signaling , 2009, Molecular systems biology.

[27]  Michele Tinti,et al.  The SH2 domain interaction landscape. , 2013, Cell reports.

[28]  Gavin MacBeath,et al.  A quantitative protein interaction network for the ErbB receptors using protein microarrays , 2006, Nature.

[29]  Gavin MacBeath,et al.  A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks , 2014, Nature Genetics.

[30]  Shaw-Pin Miaou,et al.  Pitfalls of Using R2 to Evaluate Goodness of Fit of Accident Prediction Models , 1996 .

[31]  J. Falke,et al.  Purification of proteins using polyhistidine affinity tags. , 2000, Methods in enzymology.

[32]  D. O’Connor,et al.  High-Throughput Identification of MHC Class I Binding Peptides Using an Ultradense Peptide Array , 2019, The Journal of Immunology.

[33]  A. J. Clark,et al.  The reaction between acetyl choline and muscle cells , 1926, The Journal of physiology.

[34]  Gavin MacBeath,et al.  Predicting PDZ domain–peptide interactions from primary sequences , 2008, Nature Biotechnology.

[35]  S. Juliano,et al.  A Comparison of Methods for Estimating the Functional Response Parameters of the Random Predator Equation , 1987 .

[36]  Ronald J. Hause,et al.  Comprehensive Binary Interaction Mapping of SH2 Domains via Fluorescence Polarization Reveals Novel Functional Diversification of ErbB Receptors , 2012, PloS one.

[37]  T. O. Kvålseth Cautionary Note about R 2 , 1985 .

[38]  Prisca Boisguerin,et al.  Quantification of PDZ domain specificity, prediction of ligand affinity and rational design of super-binding peptides. , 2004, Journal of molecular biology.

[39]  Ignacio E. Sánchez,et al.  Genome-Wide Prediction of SH2 Domain Targets Using Structural Information and the FoldX Algorithm , 2008, PLoS Comput. Biol..

[40]  R. Backofen,et al.  Semi-Supervised Prediction of SH2-Peptide Interactions from Imbalanced High-Throughput Data , 2013, PloS one.

[41]  L. Castagnoli,et al.  Protein Interaction Networks by Proteome Peptide Scanning , 2004, PLoS biology.

[42]  P. Bork,et al.  Linear Motif Atlas for Phosphorylation-Dependent Signaling , 2008, Science Signaling.

[43]  M. Moasser,et al.  HER2 Amplification in Tumors Activates PI3K/Akt Signaling Independent of HER3. , 2018, Cancer research.

[44]  Jordan F Hastings,et al.  The Under-Appreciated Promiscuity of the Epidermal Growth Factor Receptor Family , 2016, Front. Cell Dev. Biol..

[45]  E. Pol The importance of correct protein concentration for kinetics and affinity determination in structure-function analysis. , 2010, Journal of visualized experiments : JoVE.

[46]  L. Magee,et al.  R 2 Measures Based on Wald and Likelihood Ratio Joint Significance Tests , 1990 .

[47]  F. White,et al.  Src homology 2 domains enhance tyrosine phosphorylation in vivo by protecting binding sites in their target proteins from dephosphorylation , 2017, The Journal of Biological Chemistry.

[48]  Gavin MacBeath,et al.  Phosphotyrosine Signaling Proteins that Drive Oncogenesis Tend to be Highly Interconnected* , 2013, Molecular & Cellular Proteomics.

[49]  Prisca Boisguerin,et al.  An improved method for the synthesis of cellulose membrane-bound peptides with free C termini is useful for PDZ domain binding studies. , 2004, Chemistry & biology.

[50]  Siew Hong Leong,et al.  Chromosomal breaks at FRA18C: association with reduced DOK6 expression, altered oncogenic signaling and increased gastric cancer survival , 2017, npj Precision Oncology.

[51]  F. Tian,et al.  In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure–activity relationship approach , 2009, Amino Acids.

[52]  Tony Pawson,et al.  Specificity in Signal Transduction From Phosphotyrosine-SH2 Domain Interactions to Complex Cellular Systems , 2004, Cell.

[53]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .