Prediction of protease substrates using sequence and structure features

MOTIVATION Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners. RESULTS The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to approximately 25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate. AVAILABILITY All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.

[1]  Christina Backes,et al.  GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences , 2005, Nucleic Acids Res..

[2]  M. Zacharias,et al.  Predicting affinity and specificity of antigenic peptide binding to major histocompatibility class I molecules. , 2009, Current protein & peptide science.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  Humberto Miguel Garay-Malpartida,et al.  CaSPredictor: a new computer-based tool for caspase substrate prediction , 2005, ISMB.

[6]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[7]  Roberto Sanchez,et al.  Systematic analysis of added-value in simple comparative models of protein structure. , 2004, Structure.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  H. Neumann,et al.  Cytotoxic T lymphocytes in autoimmune and degenerative CNS diseases , 2002, Trends in Neurosciences.

[10]  N. Thornberry,et al.  Apoptosis. Life and death decisions. , 2003, Science.

[11]  Kris Gevaert,et al.  SitePredicting the cleavage of proteinase substrates. , 2009, Trends in biochemical sciences.

[12]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[13]  D. Granger,et al.  Leukocyte Recruitment and Ischemic Brain Injury , 2010, NeuroMolecular Medicine.

[14]  S. Maurer-Stroh,et al.  Analysis of Protein Processing by N-terminal Proteomics Reveals Novel Species-specific Substrate Determinants of Granzyme B Orthologs *S , 2009, Molecular & Cellular Proteomics.

[15]  Julian Downward,et al.  Involvement of survival motor neuron (SMN) protein in cell death. , 2002, Human molecular genetics.

[16]  David T. Jones,et al.  Prediction of disordered regions in proteins from position specific score matrices , 2003, Proteins.

[17]  Wei Zhang,et al.  Characterization of Domain-Peptide Interaction Interface , 2009, Molecular & Cellular Proteomics.

[18]  Christoph Peters,et al.  Toward Computer-Based Cleavage Site Prediction of Cysteine Endopeptidases , 2003, Biological chemistry.

[19]  N. Thornberry,et al.  A Combinatorial Approach Defines Specificities of Members of the Caspase Family and Granzyme B , 1997, The Journal of Biological Chemistry.

[20]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[21]  C. Craik,et al.  Granzyme B Proteolyzes Receptors Important to Proliferation and Survival, Tipping the Balance toward Apoptosis* , 2006, Journal of Biological Chemistry.

[22]  Per Jemth,et al.  PDZ domains: folding and binding. , 2007, Biochemistry.

[23]  H. Bui,et al.  Structural prediction of peptides binding to MHC class I molecules , 2006, Proteins.

[24]  S. Hubbard,et al.  The structural aspects of limited proteolysis of native proteins. , 1998, Biochimica et biophysica acta.

[25]  G. Salvesen,et al.  Structural and kinetic determinants of protease substrates , 2009, Nature Structural &Molecular Biology.

[26]  A. Sali,et al.  How well can the accuracy of comparative protein structure models be predicted? , 2008, Protein science : a publication of the Protein Society.

[27]  Allegra Via,et al.  A neural strategy for the inference of SH3 domain-peptide interaction specificity , 2005, BMC Bioinformatics.

[28]  C. Borner,et al.  The biology of cytotoxic cell granule exocytosis pathway: granzymes have evolved to induce cell death and inflammation. , 2009, Microbes and infection.

[29]  Gavin MacBeath,et al.  Predicting PDZ domain–peptide interactions from primary sequences , 2008, Nature Biotechnology.

[30]  Zheng Rong Yang,et al.  Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks , 2005, Bioinform..

[31]  T. Ley,et al.  Use of protease proteomics to discover granzyme B substrates , 2005, Immunologic research.

[32]  S. Kornbluth,et al.  Caspase Cleavage Is Not for Everyone , 2008, Cell.

[33]  A. Berger,et al.  On the active site of proteases. 3. Mapping the active site of papain; specific peptide inhibitors of papain. , 1968, Biochemical and biophysical research communications.

[34]  G. Salvesen,et al.  Caspases: preparation and characterization. , 1999, Methods.

[35]  Douglas A. Kerr,et al.  Survival motor neuron protein modulates neuron-specific apoptosis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Benjamin F. Cravatt,et al.  Global Mapping of the Topography and Magnitude of Proteolytic Events in Apoptosis , 2008, Cell.

[37]  Tin Wee Tan,et al.  SVM-based prediction of caspase substrate cleavage sites , 2006, BMC Bioinformatics.

[38]  N. Thornberry,et al.  Life and Death Decisions , 2003, Science.

[39]  S. Martin,et al.  The CASBAH: a searchable database of caspase substrates , 2007, Cell Death and Differentiation.

[40]  Tomonori Kaneko,et al.  The SH3 domain--a family of versatile peptide- and protein-recognition module. , 2008, Frontiers in bioscience : a journal and virtual library.

[41]  Josef M. Penninger,et al.  Heat-shock protein 70 antagonizes apoptosis-inducing factor , 2001, Nature Cell Biology.

[42]  David T. Barkan,et al.  Global Sequencing of Proteolytic Cleavage Sites in Apoptosis by Specific Labeling of Protein N Termini , 2008, Cell.

[43]  James C. Whisstock,et al.  PoPS: a computational tool for modeling and predicting protease specificity , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[44]  A. Rosen,et al.  Cleavage by Granzyme B Is Strongly Predictive of Autoantigen Status , 1999, The Journal of experimental medicine.

[45]  J. d'Alayer,et al.  Cysteine protease inhibition prevents mitochondrial apoptosis-inducing factor (AIF) release , 2005, Cell Death and Differentiation.

[47]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[48]  Ling Zhang,et al.  An Integrated Machine Learning System to Computationally Screen Protein Databases for Protein Binding Peptide Ligands*S , 2006, Molecular & Cellular Proteomics.

[49]  R. Herberman,et al.  Lymphocyte-mediated cytotoxicity. , 1987, Pediatric annals.

[50]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[51]  Jiunn R Chen,et al.  PDZ Domain Binding Selectivity Is Optimized Across the Mouse Proteome , 2007, Science.

[52]  Zheng Rong Yang,et al.  Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks , 2005, Bioinform..