Opinion Prediction of protein Post-Translational Modification sites: An overview

Post-translational modi ication (PTM) refers to the covalent and enzymatic modi ication of proteins during or after protein biosynthesis. In the protein biosynthesis process, the ribosomal mRNA is translated into polypeptide chains, which may further undergo PTM to form the product of mature protein [1]. PTM is a common biological mechanism of both eukaryotic and prokaryotic organisms, which regulates the protein functions, the proteolytic cleavage of regulatory subunits or the degradation of entire proteins and affects all aspects of cellular life. The PTM of a protein can also determine the cell signaling state, turnover, localization, and interactions with other proteins [2]. Therefore, the analysis of proteins and their PTMs are particularly important for the study of heart disease, cancer, neurodegenerative diseases and diabetes [3,4]. Although the characterization of PTMs gets invaluable insight into the cellular functions in etiological processes, there are still challenges. Technically, the major challenges in studying PTMs are the development of speci ic detection and puri ication methods.

[1]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[2]  Xianlin Han,et al.  Multi-dimensional mass spectrometry-based shotgun lipidomics and novel strategies for lipidomic analyses. , 2012, Mass spectrometry reviews.

[3]  Jianding Qiu,et al.  Systematic Analysis and Prediction of Pupylation Sites in Prokaryotic Proteins , 2013, PloS one.

[4]  M. Mann,et al.  Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions , 2009, Science.

[5]  Y. Ishihama,et al.  Large-scale identification of phosphorylation sites for profiling protein kinase selectivity. , 2014, Journal of proteome research.

[6]  Lennart Martens,et al.  Protein structure as a means to triage proposed PTM sites , 2013, Proteomics.

[7]  M. Mann,et al.  Uncovering Global SUMOylation Signaling Networks in a Site-Specific Manner , 2014, Nature Structural &Molecular Biology.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Wei Liu,et al.  First succinyl-proteome profiling of extensively drug-resistant Mycobacterium tuberculosis revealed involvement of succinylation in cellular physiology. , 2015, Journal of proteome research.

[10]  Md. Nurul Haque Mollah,et al.  SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. , 2016, Molecular bioSystems.

[11]  S.-W. Zhang,et al.  Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion , 2006, Amino Acids.

[12]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[13]  Mike P. Liang,et al.  Structural characterization of proteins using residue environments , 2005, Proteins.

[14]  Edward L. Huttlin,et al.  Systematic and quantitative assessment of the ubiquitin-modified proteome. , 2011, Molecular cell.

[15]  Chun-Yuan Chen,et al.  Covalent Small Ubiquitin-like Modifier (SUMO) Modification of Maf1 Protein Controls RNA Polymerase III-dependent Transcription Repression* , 2013, The Journal of Biological Chemistry.

[16]  Derek J. Bailey,et al.  One-hour proteome analysis in yeast , 2015, Nature Protocols.

[17]  A. Burlingame,et al.  Electron transfer dissociation (ETD): The mass spectrometric breakthrough essential for O‐GlcNAc protein site assignments—a study of the O‐GlcNAcylated protein Host Cell Factor C1 , 2013, Proteomics.

[18]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[19]  G. Nelsestuen,et al.  Amino-terminal alanine functions in a calcium-specific process essential for membrane binding by prothrombin fragment 1. , 1988, Biochemistry.

[20]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[21]  P. Radivojac,et al.  Evaluation of features for catalytic residue prediction in novel folds , 2007 .

[22]  G. Demartino PUPylation: something old, something new, something borrowed, something Glu. , 2009, Trends in biochemical sciences.

[23]  Paul Tempst,et al.  Protein S-nitrosylation: a physiological signal for neuronal nitric oxide , 2001, Nature Cell Biology.

[24]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[25]  C. Sander,et al.  Correlated Mutations and Residue Contacts , 1994 .

[26]  Eran Segal,et al.  Proteome-wide prediction of acetylation substrates , 2009, Proceedings of the National Academy of Sciences.

[27]  Kelley W. Moremen,et al.  Vertebrate protein glycosylation: diversity, synthesis and function , 2012, Nature Reviews Molecular Cell Biology.

[28]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[29]  D. Umlauf,et al.  Site-specific analysis of histone methylation and acetylation. , 2004, Methods in molecular biology.

[30]  Ashok Sharma,et al.  Type 2 diabetes mellitus: phylogenetic motifs for predicting protein functional sites , 2007, Journal of Biosciences.

[31]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[32]  Derek J. Bailey,et al.  The One Hour Yeast Proteome* , 2013, Molecular & Cellular Proteomics.

[33]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[34]  Ronenn Roubenoff,et al.  Convergent Random Forest predictor: methodology for predicting drug response from genome-scale data applied to anti-TNF response. , 2009, Genomics.

[35]  Changjiang Jin,et al.  CSS-Palm 2.0: an updated software for palmitoylation sites prediction. , 2008, Protein engineering, design & selection : PEDS.

[36]  Zexian Liu,et al.  GPS-YNO2: computational prediction of tyrosine nitration sites in proteins. , 2011, Molecular bioSystems.

[37]  J. Shabanowitz,et al.  Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  P. Simon Too Big to Ignore: The Business Case for Big Data , 2013 .

[39]  Chuan Wang,et al.  DescFold: A web server for protein fold recognition , 2009, BMC Bioinformatics.

[40]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[41]  Marvin Minsky,et al.  An introduction to computational geometry , 1969 .

[42]  Zhi-Ping Liu,et al.  Prediction of protein-RNA binding sites by a random forest method with combined features , 2010, Bioinform..

[43]  Hiroyuki Kurata,et al.  Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. , 2017, Molecular bioSystems.

[44]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  Dianjing Guo,et al.  A systematic identification of species-specific protein succinylation sites using joint element features information , 2017, International journal of nanomedicine.

[47]  Yong-Zi Chen,et al.  GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. , 2007, Protein engineering, design & selection : PEDS.

[48]  Masaru Tomita,et al.  Microscale phosphoproteome analysis of 10,000 cells from human cancer cell lines. , 2011, Analytical chemistry.

[49]  R. Sheppard,et al.  Feline gastrin. An example of peptide sequence analysis by mass spectrometry. , 1969, Journal of the American Chemical Society.

[50]  Vikram Pudi,et al.  RBNBC: Repeat Based Naive Bayes Classifier for Biological Sequences , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[51]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[52]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[53]  Jinyan Li,et al.  Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs , 2015, PloS one.

[54]  S. Brunak,et al.  Quantitative Phosphoproteomics Reveals Widespread Full Phosphorylation Site Occupancy During Mitosis , 2010, Science Signaling.

[55]  Katalin F Medzihradszky,et al.  Peptide sequence analysis. , 2005, Methods in enzymology.

[56]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[57]  A. Burlingame,et al.  Global Identification and Characterization of Both O-GlcNAcylation and Phosphorylation at the Murine Synapse* , 2012, Molecular & Cellular Proteomics.

[58]  Rafael A. Calvo,et al.  Accuracy and Diversity in Ensembles of Text Categorisers , 2005, CLEI Electron. J..

[59]  M. Sutter,et al.  Bacterial ubiquitin-like modifier Pup is deamidated and conjugated to substrates by distinct but homologous enzymes , 2009, Nature Structural &Molecular Biology.

[60]  C. Tung Prediction of pupylation sites using the composition of k-spaced amino acid pairs. , 2013, Journal of theoretical biology.

[61]  David E James,et al.  Re-fraction: a machine learning approach for deterministic identification of protein homologues and splice variants in large-scale MS-based proteomics. , 2012, Journal of proteome research.

[62]  Y. Zhang,et al.  Influence of succinylation on physicochemical property of yak casein micelles. , 2016, Food chemistry.

[63]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[64]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[65]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[66]  Richard W. Aldrich,et al.  A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments , 2004, Bioinform..

[67]  B. Rost,et al.  Identifying cysteines and histidines in transition‐metal‐binding sites using support vector machines and neural networks , 2006, Proteins.

[68]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[69]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[70]  Yingming Zhao,et al.  Lysine glutarylation is a protein posttranslational modification regulated by SIRT5. , 2014, Cell metabolism.