PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites

The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.

[1]  Xing-Ming Zhao,et al.  FunSAV: Predicting the Functional Effect of Single Amino Acid Variants Using a Two-Stage Random Forest Model , 2012, PloS one.

[2]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[3]  Nicola Pozzi,et al.  Redesigning allosteric activation in an enzyme , 2011, Proceedings of the National Academy of Sciences.

[4]  Yutaka Kuroda,et al.  DROP: an SVM domain linker predictor trained with optimal features selected by random forest , 2011, Bioinform..

[5]  Geoffrey I. Webb,et al.  Bioinformatic Approaches for Predicting substrates of Proteases , 2011, J. Bioinform. Comput. Biol..

[6]  K. Gevaert,et al.  Who gets cut during cell death? , 2010, Current opinion in cell biology.

[7]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[8]  Dong Xu,et al.  Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites* , 2010, Molecular & Cellular Proteomics.

[9]  William Stafford Noble,et al.  High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions , 2010, PLoS Comput. Biol..

[10]  Ursula Pieper,et al.  Prediction of protease substrates using sequence and structure features , 2010, Bioinform..

[11]  Pitter F. Huesgen,et al.  Proteome-wide analysis of protein carboxy termini: C terminomics , 2010, Nature Methods.

[12]  F. Avilés,et al.  Complementary positional proteomics for screening substrates of endo- and exoproteases , 2010, Nature Methods.

[13]  Olli Nevalainen,et al.  Pripper: prediction of caspase cleavage sites from whole proteomes , 2010, BMC Bioinformatics.

[14]  Robert Clarke,et al.  Multilevel support vector regression analysis to identify condition-specific regulatory networks , 2010, Bioinform..

[15]  Jiangning Song,et al.  Improving the accuracy of predicting disulfide connectivity by feature selection , 2010, J. Comput. Chem..

[16]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[17]  L. Foster,et al.  Isotopic labeling of terminal amines in complex samples identifies protein N-termini and protease cleavage products , 2010, Nature Biotechnology.

[18]  Neil D. Rawlings,et al.  MEROPS: the peptidase database , 2009, Nucleic Acids Res..

[19]  Lawrence J. K. Wee,et al.  A multi-factor model for caspase degradome prediction , 2009, BMC Genomics.

[20]  A. D'arcy,et al.  The crystal structure of caspase-6, a selective effector of axonal degeneration. , 2009, The Biochemical journal.

[21]  G. Salvesen,et al.  Structural and kinetic determinants of protease substrates , 2009, Nature Structural &Molecular Biology.

[22]  Geoffrey I. Webb,et al.  Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only , 2009, PloS one.

[23]  J. Verspurten,et al.  Proteome-wide Substrate Analysis Indicates Substrate Exclusion as a Mechanism to Generate Caspase-7 Versus Caspase-3 Specificity* , 2009, Molecular & Cellular Proteomics.

[24]  Ben Lehner,et al.  Intrinsic Protein Disorder and Interaction Promiscuity Are Widely Associated with Dosage Sensitivity , 2009, Cell.

[25]  Kris Gevaert,et al.  SitePredicting the cleavage of proteinase substrates. , 2009, Trends in biochemical sciences.

[26]  G. Salvesen,et al.  Human Caspases: Activation, Specificity, and Regulation* , 2009, The Journal of Biological Chemistry.

[27]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[28]  S. Boyd,et al.  Subsite cooperativity in protease specificity , 2009, Biological chemistry.

[29]  Mona Singh,et al.  M are better than one: an ensemble-based motif finder and its application to regulatory element prediction , 2009, Bioinform..

[30]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[31]  Erich E. Wanker,et al.  Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin , 2009, PLoS Comput. Biol..

[32]  S. Maurer-Stroh,et al.  Analysis of Protein Processing by N-terminal Proteomics Reveals Novel Species-specific Substrate Determinants of Granzyme B Orthologs *S , 2009, Molecular & Cellular Proteomics.

[33]  Sarah Boyd,et al.  PMAP: databases for analyzing proteolytic events and pathways , 2008, Nucleic Acids Res..

[34]  Gonzalo R. Ordóñez,et al.  The Degradome database: mammalian proteases and diseases of proteolysis , 2008, Nucleic Acids Res..

[35]  David T. Jones,et al.  Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data , 2009, Genome Biology.

[36]  Markus Kaiser,et al.  Allosteric Regulation of Proteases , 2008, Chembiochem : a European journal of chemical biology.

[37]  Wen-Lian Hsu,et al.  Protease substrate site predictors derived from machine learning on multilevel substrate phage display data , 2008, Bioinform..

[38]  S. Teichmann,et al.  Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation , 2008, Science.

[39]  Lukasz A. Kurgan,et al.  Sequence based residue depth prediction using evolutionary information and predicted secondary structure , 2008, BMC Bioinformatics.

[40]  David T. Barkan,et al.  Global Sequencing of Proteolytic Cleavage Sites in Apoptosis by Specific Labeling of Protein N Termini , 2008, Cell.

[41]  Benjamin F. Cravatt,et al.  Global Mapping of the Topography and Magnitude of Proteolytic Events in Apoptosis , 2008, Cell.

[42]  M. Hayden,et al.  Activated caspase-6 and caspase-6-cleaved fragments of huntingtin specifically colocalize in the nucleus. , 2008, Human molecular genetics.

[43]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[44]  Oliver Schilling,et al.  Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites , 2008, Nature Biotechnology.

[45]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[46]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[47]  J L Sussman,et al.  Structural disorder serves as a weak signal for intracellular protein degradation , 2008, Proteins.

[48]  Tin Wee Tan,et al.  CASVM: web server for SVM-based prediction of caspase substrates cleavage sites , 2007, Bioinform..

[49]  Jiangning Song,et al.  Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure , 2007, Bioinform..

[50]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[51]  C. López-Otín,et al.  Emerging roles of proteases in tumour suppression , 2007, Nature Reviews Cancer.

[52]  Biao Dong,et al.  Proteome-wide identification of family member-specific natural substrate repertoire of caspases , 2007, Proceedings of the National Academy of Sciences.

[53]  Avner Schlessinger,et al.  Natively unstructured regions in proteins identified from contact predictions , 2007, Bioinform..

[54]  Christine A. Orengo,et al.  Inferring Function Using Patterns of Native Disorder in Proteins , 2007, PLoS Comput. Biol..

[55]  Burkhard Rost,et al.  Prediction of DNA-binding residues from sequence , 2007, ISMB/ECCB.

[56]  Burkhard Rost,et al.  Protein–Protein Interaction Hotspots Carved into Sequences , 2007, PLoS Comput. Biol..

[57]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[58]  G. Salvesen,et al.  Identification of proteolytic cleavage sites by quantitative proteomics. , 2007, Journal of proteome research.

[59]  Avner Schlessinger,et al.  Natively Unstructured Loops Differ from Other Loops , 2007, PLoS Comput. Biol..

[60]  Christopher J. Oldfield,et al.  Intrinsic disorder and functional proteomics. , 2007, Biophysical journal.

[61]  Jeffrey W. Smith,et al.  CutDB: a proteolytic event database , 2006, Nucleic Acids Res..

[62]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[63]  Tin Wee Tan,et al.  SVM-based prediction of caspase substrate cleavage sites , 2006, BMC Bioinformatics.

[64]  Jiangning Song,et al.  Predicting residue-wise contact orders in proteins by support vector regression , 2006, BMC Bioinformatics.

[65]  B. Turk Targeting proteases: successes, failures and future prospects , 2006, Nature Reviews Drug Discovery.

[66]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[67]  Jiangning Song,et al.  Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information , 2006, BMC Bioinformatics.

[68]  Christina Backes,et al.  GraBCas: a bioinformatics tool for score-based prediction of Caspase- and Granzyme B-cleavage sites in protein sequences , 2005, Nucleic Acids Res..

[69]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[70]  Zheng Rong Yang,et al.  Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks , 2005, Bioinform..

[71]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[72]  James C. Whisstock,et al.  Pops: a Computational Tool for Modeling and Predicting Protease Specificity , 2004, J. Bioinform. Comput. Biol..

[73]  Humberto Miguel Garay-Malpartida,et al.  CaSPredictor: a new computer-based tool for caspase substrate prediction , 2005, ISMB.

[74]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[75]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[76]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[77]  Mark Gerstein,et al.  Prediction of regulatory networks: genome-wide identification of transcription factor targets from gene expression data , 2003, Bioinform..

[78]  Joel R. Bock,et al.  A New Method to Estimate Ligand-Receptor Energetics* , 2002, Molecular & Cellular Proteomics.

[79]  C. López-Otín,et al.  Protease degradomics: A new challenge for proteomics , 2002, Nature Reviews Molecular Cell Biology.

[80]  Z. Qin,et al.  Caspase 3-cleaved N-terminal fragments of wild-type and mutant huntingtin are present in normal and Huntington's disease brains, associate with membranes, and undergo calpain-dependent proteolysis , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[82]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[83]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[84]  Neil D. Rawlings,et al.  MEROPS: the peptidase database , 2007, Nucleic Acids Res..

[85]  D. Nicholson,et al.  Caspase structure, proteolytic substrates, and function during apoptotic cell death , 1999, Cell Death and Differentiation.

[86]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[87]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[88]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[89]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[90]  G. Cohen Role of caspases as the executioners of apoptosis , 1998 .

[91]  S. Hubbard,et al.  The structural aspects of limited proteolysis of native proteins. , 1998, Biochimica et biophysica acta.

[92]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[93]  E. Cera,et al.  Site-specific dissection of substrate recognition by thrombin , 1997, Nature Biotechnology.

[94]  G M Cohen,et al.  Caspases: the executioners of apoptosis. , 1997, The Biochemical journal.

[95]  N. Thornberry The caspase family of cysteine proteases. , 1997, British medical bulletin.

[96]  Ronald Breslow,et al.  Molecular recognition , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[97]  J M Thornton,et al.  Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors. , 1991, Journal of molecular biology.

[98]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[99]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[100]  A. Berger,et al.  On the size of the active site in proteases. I. Papain. , 1967, Biochemical and biophysical research communications.

[101]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.