DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites

MOTIVATION Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' process of proteins, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[2]  J C Reed,et al.  Characterization of Caspase Processing and Activation in HL-60 Cell Cytosol Under Cell-free Conditions , 1999, The Journal of Biological Chemistry.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[5]  Xing-Ming Zhao,et al.  Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets , 2014, Bioinform..

[6]  C. Overall,et al.  Multiplex N-terminome Analysis of MMP-2 and MMP-9 Substrate Degradomes by iTRAQ-TAILS Quantitative Proteomics* , 2010, Molecular & Cellular Proteomics.

[7]  C. López-Otín,et al.  Protease degradomics: A new challenge for proteomics , 2002, Nature Reviews Molecular Cell Biology.

[8]  V. Smits,et al.  Cleavage and degradation of Claspin during apoptosis by caspases and the proteasome , 2007, Cell Death and Differentiation.

[9]  David B. Alexander,et al.  The Membrane-Anchored MMP Inhibitor RECK Is a Key Regulator of Extracellular Matrix Integrity and Angiogenesis , 2001, Cell.

[10]  Olli Nevalainen,et al.  Pripper: prediction of caspase cleavage sites from whole proteomes , 2010, BMC Bioinformatics.

[11]  Gary Hardiman,et al.  Repression of caspase-3 and RNA-binding protein HuR cleavage by cyclooxygenase-2 promotes drug resistance in oral squamous cell carcinoma , 2016, Oncogene.

[12]  C. Craik,et al.  Quantitative MS-based enzymology of caspases reveals distinct protein substrate specificities, hierarchies, and cellular roles , 2016, Proceedings of the National Academy of Sciences.

[13]  Geoffrey I. Webb,et al.  GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome , 2015, Bioinform..

[14]  Lucas Pelkmans,et al.  Post-transcriptional control of executioner caspases by RNA-binding proteins , 2016, Genes & development.

[15]  Kenichiro Imai,et al.  ScreenCap3: Improving prediction of caspase-3 cleavage sites using experimentally verified noncleavage sites , 2014, Proteomics.

[16]  Yaohang Li,et al.  DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions , 2019, Proteomics.

[17]  Junjie Chen,et al.  Human Claspin Is Required for Replication Checkpoint Control* , 2003, Journal of Biological Chemistry.

[18]  Yanchun Liang,et al.  Capsule network for protein post-translational modification site prediction , 2018, Bioinform..

[19]  P. Clarke,et al.  Cleavage of Claspin by Caspase-7 during Apoptosis Inhibits the Chk1 Pathway* , 2005, Journal of Biological Chemistry.

[20]  Xing-Ming Zhao,et al.  DeepPhos: prediction of protein phosphorylation sites with deep learning , 2019, Bioinform..

[21]  Nikolas Nikolaidis,et al.  Functional Diversification and Specialization of Cytosolic 70-kDa Heat Shock Proteins , 2015, Scientific Reports.

[22]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[23]  Christiane Wiesner,et al.  A specific subset of RabGTPases controls cell surface exposure of MT1-MMP, extracellular matrix degradation and three-dimensional invasion of macrophages , 2013, Journal of Cell Science.

[24]  M. Grigorian,et al.  Extracellular S100A4(mts1) stimulates invasive growth of mouse endothelial cells and modulates MMP-13 matrix metalloproteinase activity , 2004, Oncogene.

[25]  L. Oliver,et al.  Induction of a Caspase-3-like Activity by Calcium in Normal Cytosolic Extracts Triggers Nuclear Apoptosis in a Cell-free System* , 1998, The Journal of Biological Chemistry.

[26]  Yanchun Liang,et al.  MusiteDeep: a deep‐learning framework for general and kinase‐specific phosphorylation site prediction , 2017, Bioinform..

[27]  Yaoqi Zhou,et al.  Accurate single‐sequence prediction of solvent accessible surface area using local and global features , 2014, Proteins.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[30]  Lukasz Kurgan,et al.  Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains , 2019, Briefings Bioinform..

[31]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[32]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[33]  D H Wolf,et al.  [Proteasomes. Complex proteases lead to a new understanding of cellular regulation through proteolysis]. , 1995, Die Naturwissenschaften.

[34]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[35]  M. B. Gillespie,et al.  Caspase-mediated Cleavage of RNA-binding Protein HuR Regulates c-Myc Protein Expression after Hypoxic Stress* , 2011, The Journal of Biological Chemistry.

[36]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[37]  Geoffrey I. Webb,et al.  GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features , 2016, Scientific Reports.

[38]  Wolfgang Hilt,et al.  Proteasomen Komplexe Proteasen führen zu einem neuen Verständnis der zellulären Regulation durch Proteolyse , 1995, Naturwissenschaften.

[39]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[40]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  D. Matthews,et al.  Structure of human rhinovirus 3C protease reveals a trypsin-like polypeptide fold, RNA-binding site, and means for cleaving precursor polyprotein , 1994, Cell.

[43]  Jon Christensen,et al.  Matrix-metalloproteinase-9 is cleaved and activated by Cathepsin K , 2015, BMC Research Notes.

[44]  Geoffrey I. Webb,et al.  PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites , 2012, PloS one.

[45]  Hong-Bin Shen,et al.  LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields , 2013, Proteins.

[46]  Yu Li,et al.  Promoter analysis and prediction in the human genome using sequence-based deep learning models , 2019, Bioinform..

[47]  Christopher M. Overall,et al.  A Statistics-based Platform for Quantitative N-terminome Analysis and Identification of Protease Cleavage Products , 2010, Molecular & Cellular Proteomics.

[48]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[49]  J. Keski‐Oja,et al.  Secretion of active membrane type 1 matrix metalloproteinase (MMP‐14) into extracellular space in microvesicular exosomes , 2008, Journal of cellular biochemistry.

[50]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[51]  Piotr Cieplak,et al.  CleavPredict: A Platform for Reasoning about Matrix Metalloproteinases Proteolytic Events , 2015, PloS one.

[52]  Neil D. Rawlings,et al.  The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database , 2017, Nucleic Acids Res..

[53]  Gholamreza Haffari,et al.  Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods , 2018, Briefings Bioinform..

[54]  Raghvendra Mall,et al.  DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).