PCSPred_SC: Prediction of Protein Citrullination Sites Using an Effective Sequence-Based Combined Method

As one of post-translational modifications (PTMs), protein citrullination is crucial in a diverse array of cellular processes and implicated in a slew of human pathology. Therefore, accurate identification of protein citrullination sites (PCSs) is urgently needed to illuminate the reaction details and the complex pathogenesis related to the protein citrullination. In view of the limitations of the existing PCS predictors, this study proposes a novel and powerful sequence-based combined method named PCSPred_SC to further enhance the prediction performance. Various feature extraction methods are developed to mine sequence-derived biological information. Under the feature space, the predictive capabilities of different prediction algorithms, over-sampling methods, and feature selection methods are respectively explored. Experimental results indicate that the over-sampling methods are effective to solve the imbalanced dataset problem and the feature selection methods are significant in removing irrelevant and redundant features. On the same dataset using 10-fold cross validation, PCSPred_SC constructed by the combination of support vector machine (SVM), Adasyn, and t-distributed stochastic neighbor embedding (t-SNE) achieves much more outstanding performance than the competing methods, while reducing the number of features used for this task remarkably. It is anticipated that the proposed method will provide significant information to broaden our knowledge of citrullination-related biological processes.

[1]  Lisa Harlow,et al.  Identification of citrullinated hsp90 isoforms as novel autoantigens in rheumatoid arthritis-associated interstitial lung disease. , 2013, Arthritis and rheumatism.

[2]  Theodoros Goulas,et al.  Structure and mechanism of a bacterial host-protein citrullinating virulence factor, Porphyromonas gingivalis peptidylarginine deiminase , 2015, Scientific Reports.

[3]  Kuo-Chen Chou,et al.  pLoc_bal‐mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC , 2018, Bioinform..

[4]  Alan Christoffels,et al.  Prediction of human-Bacillus anthracis protein–protein interactions using multi-layer neural network , 2018, Bioinform..

[5]  Bing Xu,et al.  PADI4 has genetic susceptibility to gastric carcinoma and upregulates CXCR2, KRT14 and TNF-α expression levels , 2016, Oncotarget.

[6]  D. Wagner,et al.  Citrullinated histone H3, a biomarker of neutrophil extracellular trap formation, predicts the risk of venous thromboembolism in cancer patients , 2018, Journal of thrombosis and haemostasis : JTH.

[7]  Krister Wennerberg,et al.  Toward universal protein post-translational modification detection in high throughput format. , 2018, Chemical communications.

[8]  Hien M. Nguyen,et al.  Borderline over-sampling for imbalanced data classification , 2009, Int. J. Knowl. Eng. Soft Data Paradigms.

[9]  Carlo Ferrari,et al.  Exploring the potential of 3D Zernike descriptors and SVM for protein–protein interface prediction , 2018, BMC Bioinformatics.

[10]  Xiaofei An,et al.  PAD1 promotes epithelial-mesenchymal transition and metastasis in triple-negative breast cancer cells by regulating MEK1-ERK1/2-MMP2 signaling. , 2017, Cancer letters.

[11]  Hiroshi Shimizu,et al.  Sequential reorganization of cornified cell keratin filaments involving filaggrin-mediated compaction and keratin 1 deimination. , 2002, The Journal of investigative dermatology.

[12]  Long Zhang,et al.  Protein-protein interactions prediction based on ensemble deep neural networks , 2019, Neurocomputing.

[13]  György Nagy,et al.  Citrullination under physiological and pathological conditions. , 2012, Joint, bone, spine : revue du rhumatisme.

[14]  Paul R Thompson,et al.  The Rheumatoid Arthritis-Associated Citrullinome. , 2018, Cell chemical biology.

[15]  Paul R. Thompson,et al.  Molecular targeting of protein arginine deiminases to suppress colitis and prevent colon cancer , 2015, Oncotarget.

[16]  Lin Lu,et al.  Predicting Citrullination Sites in Protein Sequences Using mRMR Method and Random Forest Algorithm. , 2017, Combinatorial chemistry & high throughput screening.

[17]  David J Beebe,et al.  Peptidylarginine Deiminase 4 Contributes to Tumor Necrosis Factor α–Induced Inflammatory Arthritis , 2014, Arthritis & rheumatology.

[18]  Paul R. Thompson,et al.  Chemical Biology of Protein Arginine Modifications in Epigenetic Regulation , 2015, Chemical reviews.

[19]  Zahoor Jan,et al.  iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. , 2018, Journal of theoretical biology.

[20]  Lihua Dong,et al.  Theoretical insights into the protonation states of active site cysteine and citrullination mechanism of Porphyromonas gingivalis peptidylarginine deiminase , 2017, Proteins.

[21]  Yang Zhang,et al.  COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information , 2017, Nucleic Acids Res..

[22]  Yi Xiong,et al.  Protein-protein interface hot spots prediction based on a hybrid feature selection strategy , 2018, BMC Bioinformatics.

[23]  G. Pruijn,et al.  Phenylglyoxal-Based Visualization of Citrullinated Proteins on Western Blots , 2015, Molecules.

[24]  Dieter Deforce,et al.  Citrullinated vimentin as an important antigen in immune complexes from synovial fluid of rheumatoid arthritis patients with antibodies against citrullinated proteins , 2010, Arthritis research & therapy.

[25]  R. Toes,et al.  Pitfalls in the detection of citrullination and carbamylation. , 2017, Autoimmunity reviews.

[26]  Jihong Pan,et al.  PADI2-Mediated Citrullination Promotes Prostate Cancer Progression. , 2017, Cancer research.

[27]  Ronesh Sharma,et al.  MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles. , 2018, Journal of theoretical biology.

[28]  Marcus Buschbeck,et al.  A cellular model reflecting the phenotypic heterogeneity of mutant HRAS driven squamous cell carcinoma , 2016, International journal of cancer.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[31]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[32]  Shengli Zhang,et al.  Prediction of Apoptosis Protein’s Subcellular Localization by Fusing Two Different Descriptors Based on Evolutionary Information , 2018, Acta biotheoretica.

[33]  L. Anguish,et al.  Potential Role for PAD2 in Gene Regulation in Breast Cancer Cells , 2012, PloS one.

[34]  Feng Huang,et al.  SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions , 2018, PLoS Comput. Biol..

[35]  Yanming Wang,et al.  Peptidylarginine deiminases in citrullination, gene regulation, health and pathogenesis. , 2013, Biochimica et biophysica acta.

[36]  Arnold Steckel,et al.  Citrulline Effect Is a Characteristic Feature of Deiminated Peptides in Tandem Mass Spectrometry , 2019, Journal of the American Society for Mass Spectrometry.

[37]  Marcus Buschbeck,et al.  Downregulation of the Deiminase PADI2 Is an Early Event in Colorectal Carcinogenesis and Indicates Poor Prognosis , 2016, Molecular Cancer Research.

[38]  Song Ling,et al.  Citrullinated calreticulin potentiates rheumatoid arthritis shared epitope signaling. , 2013, Arthritis and rheumatism.

[39]  Paul R Thompson,et al.  Protein Arginine Deiminases and Associated Citrullination: Physiological Functions and Diseases Associated with Dysregulation. , 2015, Current drug targets.

[40]  Burcu Çarklı Yavuz,et al.  Prediction of Protein Secondary Structure With Clonal Selection Algorithm and Multilayer Perceptron , 2018, IEEE Access.

[41]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[42]  Jose A. Romagnoli,et al.  A Deep Learning Approach for Process Data Visualization Using t-Distributed Stochastic Neighbor Embedding , 2019, Industrial & Engineering Chemistry Research.

[43]  Paul R. Thompson,et al.  Role of peptidylarginine deiminase 2 (PAD2) in mammary carcinoma cell migration , 2017, BMC Cancer.

[44]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[45]  Zhe Ju,et al.  Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou's general pseudo amino acid composition. , 2018, Gene.

[46]  Johan Lengqvist,et al.  MS analysis of rheumatoid arthritic synovial tissue identifies specific citrullination sites on fibrinogen , 2010, Proteomics. Clinical applications.

[47]  Paul McEwan,et al.  Development of a Selective Inhibitor of Protein Arginine Deiminase 2. , 2017, Journal of medicinal chemistry.

[48]  Maryam Tayefi,et al.  The application of a decision tree to establish the parameters associated with hypertension , 2017, Comput. Methods Programs Biomed..

[49]  Bing Xu,et al.  PADI2 gene confers susceptibility to breast cancer and plays tumorigenic role via ACSL4, BINC3 and CA9 signaling , 2016, Cancer Cell International.

[50]  G. Pruijn,et al.  Methods for the Detection of Peptidylarginine Deiminase (PAD) Activity and Protein Citrullination* , 2013, Molecular & Cellular Proteomics.

[51]  Jian Huang,et al.  Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors , 2012, Statistics and Computing.

[52]  Satoko Aratani,et al.  Inhibitory effects of ubiquitination of synoviolin by PADI4. , 2017, Molecular medicine reports.

[53]  Sangwon Cha,et al.  In Situ Probing Citrullinated Sites in a Peptide by Reactive Desorption Electrospray Ionization Mass Spectrometry , 2018 .

[54]  Zhigang Chen,et al.  An Integrated Framework for Functional Annotation of Protein Structural Domains , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[55]  Zafer Aydin,et al.  Dimensionality reduction for protein secondary structure and solvent accesibility prediction , 2018, J. Bioinform. Comput. Biol..

[56]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[57]  Hui Zhang,et al.  HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy , 2018, RNA biology.

[58]  Oscar Bedoya,et al.  Remote homology detection incorporating the context of physicochemical properties , 2014, Comput. Biol. Medicine.

[59]  Paul R Thompson,et al.  The Development of Benzimidazole-Based Clickable Probes for the Efficient Labeling of Cellular Protein Arginine Deiminases (PADs). , 2018, ACS chemical biology.