MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition

S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.

[1]  Tzong-Yi Lee,et al.  ViralPhos: incorporating a recursively statistical method to predict phosphorylation sites on virus proteins , 2013, BMC Bioinformatics.

[2]  Tzong-Yi Lee,et al.  An Intelligent System for Identifying Acetylated Lysine on Histones and Nonhistone Proteins , 2014, BioMed research international.

[3]  Shandar Ahmad,et al.  RVP-net: online prediction of real valued accessible surface area of proteins from single sequences , 2003, Bioinform..

[4]  Tzong-Yi Lee,et al.  MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs , 2015, Bioinform..

[5]  Kai-Ming Liu,et al.  Mice with Alopecia, Osteoporosis, and Systemic Amyloidosis Due to Mutation in Zdhhc13, a Gene Coding for Palmitoyl Acyltransferase , 2010, PLoS genetics.

[6]  Jorng-Tzong Horng,et al.  Incorporating support vector machine for identifying protein tyrosine sulfation sites , 2009, J. Comput. Chem..

[7]  Hillel Adesnik,et al.  Identification of PSD-95 Palmitoylating Enzymes , 2004, Neuron.

[8]  Yu Xue,et al.  NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm , 2006, BMC Bioinformatics.

[9]  Jorng-Tzong Horng,et al.  Incorporating hidden Markov models for identifying protein kinase‐specific phosphorylation sites , 2005, J. Comput. Chem..

[10]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[11]  Hsien-Da Huang,et al.  Incorporating Evolutionary Information and Functional Domains for Identifying RNA Splicing Factors in Humans , 2011, PloS one.

[12]  Hsien-Da Huang,et al.  dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins , 2015, Nucleic Acids Res..

[13]  A. El-Husseini,et al.  Modulation of neuronal protein trafficking and function by palmitoylation , 2005, Current Opinion in Neurobiology.

[14]  Ling-Yun Wu,et al.  Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. , 2009, Protein engineering, design & selection : PEDS.

[15]  M. Resh Palmitoylation of Ligands, Receptors, and Intracellular Signaling Molecules , 2006, Science's STKE.

[16]  R. Deschenes,et al.  New insights into the mechanisms of protein palmitoylation. , 2003, Biochemistry.

[17]  Tzong-Yi Lee,et al.  UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines , 2016, BMC Systems Biology.

[18]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[19]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[20]  Hsien-Da Huang,et al.  KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns , 2007, Nucleic Acids Res..

[21]  Shao-Ping Shi,et al.  The prediction of palmitoylation site locations using a multiple feature extraction method. , 2013, Journal of molecular graphics & modelling.

[22]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[23]  Lulu Ning,et al.  In Silico Identification of Protein S-Palmitoylation Sites and Their Involvement in Human Inherited Disease , 2015, J. Chem. Inf. Model..

[24]  L. Dietrich,et al.  On the mechanism of protein palmitoylation , 2004, EMBO reports.

[25]  Tzong-Yi Lee,et al.  Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences , 2011, Bioinform..

[26]  Bandana Kumari,et al.  PalmPred: An SVM Based Palmitoylation Prediction Method Using Sequence Profile Information , 2014, PloS one.

[27]  J. Greaves,et al.  The intracellular dynamic of protein palmitoylation , 2010, The Journal of cell biology.

[28]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[29]  M. Wilkins,et al.  Surface accessibility of protein post-translational modifications. , 2007, Journal of proteome research.

[30]  Tzong-Yi Lee,et al.  SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites , 2016, BMC Genomics.

[31]  K. Chou,et al.  Prediction and analysis of protein palmitoylation sites. , 2011, Biochimie.

[32]  Shaun S. Sanders,et al.  Hip14l-deficient mice develop neuropathological and behavioural features of Huntington disease. , 2013, Human molecular genetics.

[33]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[34]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[35]  Tzong-Yi Lee,et al.  PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity , 2011, BMC Bioinformatics.

[36]  Claire-Anne Gutekunst,et al.  Huntingtin-Interacting Protein HIP14 Is a Palmitoyl Transferase Involved in Palmitoylation and Trafficking of Multiple Neuronal Proteins , 2004, Neuron.

[37]  Hanno Steen,et al.  Proteome Scale Characterization of Human S-Acylated Proteins in Lipid Raft-enriched and Non-raft Membranes* , 2009, Molecular & Cellular Proteomics.

[38]  Tzong-Yi Lee,et al.  Carboxylator: incorporating solvent-accessible surface area for identifying protein carboxylation sites , 2011, J. Comput. Aided Mol. Des..

[39]  Jorng-Tzong Horng,et al.  KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites , 2005, Nucleic Acids Res..

[40]  J. Smotrys,et al.  Palmitoylation of intracellular signaling proteins: regulation and function. , 2004, Annual review of biochemistry.

[41]  Søren Brunak,et al.  Huntingtin-interacting protein 14 is a type 1 diabetes candidate protein regulating insulin secretion and β-cell apoptosis , 2011, Proceedings of the National Academy of Sciences.

[42]  R. Deschenes,et al.  Palmitoylation: policing protein stability and traffic , 2007, Nature Reviews Molecular Cell Biology.

[43]  M. Marsh,et al.  The on-off story of protein palmitoylation. , 2003, Trends in cell biology.

[44]  D. Bredt,et al.  Protein palmitoylation: a regulator of neuronal development and function , 2002, Nature Reviews Neuroscience.

[45]  Yu-Ju Chen,et al.  dbSNO 2.0: a resource for exploring structural environment, functional and disease association and regulatory network of protein S-nitrosylation , 2014, Nucleic Acids Res..

[46]  Perry Evans,et al.  Site-Specific Proteomic Mapping Identifies Selectively Modified Regulatory Cysteine Residues in Functionally Distinct Protein Networks. , 2015, Chemistry & biology.

[47]  J. Greaves,et al.  Jcb: Mini-review Introduction , 2022 .

[48]  Rainbo Hultman,et al.  Site-specific analysis of protein S-acylation by resin-assisted capture[S] , 2011, Journal of Lipid Research.

[49]  Hsien-Da Huang,et al.  dbSNO: a database of cysteine S-nitrosylation , 2012, Bioinform..

[50]  Yu-Ju Chen,et al.  Characterization and identification of protein O-GlcNAcylation sites with substrate specificity , 2014, BMC Bioinformatics.

[51]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[52]  Wei-Chi Ku,et al.  S-alkylating labeling strategy for site-specific identification of the s-nitrosoproteome. , 2010, Journal of proteome research.

[53]  Changjiang Jin,et al.  CSS-Palm 2.0: an updated software for palmitoylation sites prediction. , 2008, Protein engineering, design & selection : PEDS.

[54]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[55]  Tzong-Yi Lee,et al.  GSHSite: Exploiting an Iteratively Statistical Method to Identify S-Glutathionylation Sites with Substrate Specificity , 2015, PloS one.

[56]  Eric Rubinstein,et al.  Differential stability of tetraspanin/tetraspanin interactions: role of palmitoylation , 2002, FEBS letters.

[57]  Tzong-Yi Lee,et al.  Identifying Protein Phosphorylation Sites with Kinase Substrate Specificity on Human Viruses , 2012, PloS one.

[58]  Yu Xue,et al.  CSS-Palm: palmitoylation site prediction with a clustering and scoring strategy (CSS) , 2006, Bioinform..

[59]  Hsien-Da Huang,et al.  SNOSite: Exploiting Maximal Dependence Decomposition to Identify Cysteine S-Nitrosylation with Substrate Site Specificity , 2011, PloS one.

[60]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[61]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..