ESA‐UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives

Motivation: Numerous ubiquitination sites remain undiscovered because of the limitations of mass spectrometry‐based methods. Existing prediction methods use randomly selected non‐validated sites as non‐ubiquitination sites to train ubiquitination site prediction models. Results: We propose an evolutionary screening algorithm (ESA) to select effective negatives among non‐validated sites and an ESA‐based prediction method, ESA‐UbiSite, to identify human ubiquitination sites. The ESA selects non‐validated sites least likely to be ubiquitination sites as training negatives. Moreover, the ESA and ESA‐UbiSite use a set of well‐selected physicochemical properties together with a support vector machine for accurate prediction. Experimental results show that ESA‐UbiSite with effective negatives achieved 0.92 test accuracy and a Matthews's correlation coefficient of 0.48, better than existing prediction methods. The ESA increased ESA‐UbiSite's test accuracy from 0.75 to 0.92 and can improve other post‐translational modification site prediction methods. Availability and Implementation: An ESA‐UbiSite‐based web server has been established at http://iclab.life.nctu.edu.tw/iclab_webtools/ESAUbiSite/. Contact: syho@mail.nctu.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  K. Robert Lai,et al.  Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities , 2015, BMC Bioinformatics.

[2]  G. Gill,et al.  SUMO and ubiquitin in the nucleus: different functions, similar mechanisms? , 2004, Genes & development.

[3]  Bermseok Oh,et al.  Prediction of phosphorylation sites using SVMs , 2004, Bioinform..

[4]  Scott D. Emr,et al.  Ubiquitin-dependent lysosomal membrane protein sorting and degradation. , 2015, Molecular cell.

[5]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[6]  P K Ponnuswamy,et al.  Prediction of transmembrane helices from hydrophobic characteristics of proteins. , 2009, International journal of peptide and protein research.

[7]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[8]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[9]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[10]  Jeffrey N Keller,et al.  Increased protein hydrophobicity in response to aging and Alzheimer disease. , 2010, Free radical biology & medicine.

[11]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[12]  Daniel Schwartz,et al.  Prediction of lysine post-translational modifications using bioinformatic tools. , 2012, Essays in biochemistry.

[13]  Hsien-Da Huang,et al.  dbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications , 2012, Nucleic Acids Res..

[14]  Shinn-Ying Ho,et al.  Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Rati Verma,et al.  Mutations in the hydrophobic core of ubiquitin differentially affect its recognition by receptor proteins. , 2008, Journal of molecular biology.

[16]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[17]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[18]  Georg Auburger,et al.  The ubiquitin pathway in Parkinson's disease , 1998, Nature.

[19]  Yu Xue,et al.  MeMo: a web tool for prediction of protein methylation modifications , 2006, Nucleic Acids Res..

[20]  Guido Kroemer,et al.  Mitochondrio‐nuclear translocation of AIF in apoptosis and necrosis , 2000, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[21]  Hailong Zhu,et al.  Predicting protein functions using incomplete hierarchical labels , 2015, BMC Bioinformatics.

[22]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[23]  S. Wold,et al.  Principal property values for six non-natural amino acids and their application to a structure–activity relationship for oxytocin peptide analogues , 1987 .

[24]  Silvio C. E. Tosatto,et al.  RUBI: rapid proteomic-scale prediction of lysine ubiquitination and factors influencing predictor performance , 2013, Amino Acids.

[25]  C. Joazeiro,et al.  Hrd1p/Der3p is a membrane-anchored ubiquitin ligase required for ER-associated degradation , 2000, Nature Cell Biology.

[26]  Steven A Carr,et al.  Integrated proteomic analysis of post-translational modifications by serial enrichment , 2013, Nature Methods.

[27]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[28]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[29]  Hui Liu,et al.  Improving compound–protein interaction prediction by building up highly credible negative samples , 2015, Bioinform..

[30]  Pier Paolo Di Fiore,et al.  Multiple monoubiquitination of RTKs is sufficient for their endocytosis and degradation , 2003, Nature Cell Biology.

[31]  Weimin Guo,et al.  Regulation of the ubiquitin proteasome pathway in human lens epithelial cells during the cell cycle. , 2004, Experimental eye research.

[32]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[33]  J. Meek Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[35]  Samie R Jaffrey,et al.  Global analysis of lysine ubiquitination by ubiquitin remnant immunoaffinity profiling , 2010, Nature Biotechnology.

[36]  I. Scott,et al.  Lysine-based post-translational modification of proteins , 2012 .

[37]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[38]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[39]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Transactions on Evolutionary Computation.

[40]  Linda Hicke,et al.  Ubiquitin-binding domains , 2005, Nature Reviews Molecular Cell Biology.

[41]  A. Seth,et al.  The ubiquitin-mediated protein degradation pathway in cancer: therapeutic implications. , 2004, European journal of cancer.

[42]  K Nakashima,et al.  High-performance liquid chromatography-chemiluminescence determination of methamphetamine in human serum using N-(4-aminobutyl)-N-ethylisoluminol as a chemiluminogen. , 1990, Journal of chromatography.

[43]  R. Bürger,et al.  Evolution of genetic variability and the advantage of sex and recombination in changing environments. , 1999, Genetics.

[44]  A. Hershko,et al.  The ubiquitin system for protein degradation and some of its roles in the control of the cell division cycle* , 2005, Cell Death and Differentiation.

[45]  Thomas Kodadek,et al.  The hydrophobic patch of ubiquitin is required to protect transactivator–promoter complexes from destabilization by the proteasomal ATPases , 2009, Nucleic acids research.

[46]  Jeremy M. Brown,et al.  The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference , 2009, Systematic biology.

[47]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[48]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[49]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50]  Ermir Qeli,et al.  Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data. , 2014, Journal of proteomics.

[51]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[52]  Bin Zhang,et al.  PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse , 2011, Nucleic Acids Res..