CarSPred: A Computational Tool for Predicting Carbonylation Sites of Human Proteins

Protein carbonylation is one of the most pervasive oxidative stress-induced post-translational modifications (PTMs), which plays a significant role in the etiology and progression of several human diseases. It has been regarded as a biomarker of oxidative stress due to its relatively early formation and stability compared with other oxidative PTMs. Only a subset of proteins is prone to carbonylation and most carbonyl groups are formed from lysine (K), arginine (R), threonine (T) and proline (P) residues. Recent advancements in analysis of the PTM by mass spectrometry provided new insights into the mechanisms of protein carbonylation, such as protein susceptibility and exact modification sites. However, the experimental approaches to identifying carbonylation sites are costly, time-consuming and capable of processing a limited number of proteins, and there is no bioinformatics method or tool devoted to predicting carbonylation sites of human proteins so far. In the paper, a computational method is proposed to identify carbonylation sites of human proteins. The method extracted four kinds of features and combined the minimum Redundancy Maximum Relevance (mRMR) feature selection criterion with weighted support vector machine (WSVM) to achieve total accuracies of 85.72%, 85.95%, 83.92% and 85.72% for K, R, T and P carbonylation site predictions respectively using 10-fold cross-validation. The final optimal feature sets were analysed, the position-specific composition and hydrophobicity environment of flanking residues of modification sites were discussed. In addition, a software tool named CarSPred has been developed to facilitate the application of the method. Datasets and the software involved in the paper are available at https://sourceforge.net/projects/hqlstudio/files/CarSPred-1.0/.

[1]  George Perry,et al.  Oxidative stress in diabetes and Alzheimer's disease. , 2009, Journal of Alzheimer's disease : JAD.

[2]  R. Hoffmann,et al.  Protein carbonylation as a major hallmark of oxidative damage: update of analytical strategies. , 2014, Mass spectrometry reviews.

[3]  U. Bandyopadhyay,et al.  Reactive oxygen species: oxidative damage and pathogenesis , 1999 .

[4]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[5]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[6]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[7]  Jian Liu,et al.  Computational refinement of post-translational modifications predicted from tandem mass spectrometry , 2011, Bioinform..

[8]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[9]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[10]  Adelina Rogowska-Wrzesinska,et al.  Protein carbonylation and metal-catalyzed protein oxidation in a cellular perspective. , 2011, Journal of proteomics.

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  M. Hernáez,et al.  Combined proteomic approaches for the identification of specific amino acid residues modified by 4-hydroxy-2-nonenal under physiological conditions. , 2010, Journal of proteome research.

[13]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yong-Zi Chen,et al.  GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. , 2007, Protein engineering, design & selection : PEDS.

[15]  A. Scaloni,et al.  Redox proteomics: chemical principles, methodological approaches and biological/biomedical promises. , 2013, Chemical reviews.

[16]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[17]  Anthony J. Kusalik,et al.  Computational phosphorylation site prediction in plants using random forests and organism-specific instance weights , 2013, Bioinform..

[18]  Roberto Colombo,et al.  Protein carbonylation in human diseases. , 2003, Trends in molecular medicine.

[19]  Sonia Longhi,et al.  Rules Governing Selective Protein Carbonylation , 2009, PloS one.

[20]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[21]  B. Friguet,et al.  Proteomic quantification and identification of carbonylated proteins upon oxidative stress and during cellular aging. , 2013, Journal of proteomics.

[22]  Ian Max Møller,et al.  Pattern of occurrence and occupancy of carbonylation sites in proteins , 2011, Proteomics.

[23]  I. Miller,et al.  Detecting oxidative post-translational modifications in proteins , 2007, Amino Acids.

[24]  Ravi Chand Bollineni,et al.  Proteome-wide profiling of carbonylated proteins and carbonylation sites in HeLa cells under mild oxidative stress conditions. , 2014, Free radical biology & medicine.

[25]  Roberto Colombo,et al.  Protein carbonyl groups as biomarkers of oxidative stress. , 2003, Clinica chimica acta; international journal of clinical chemistry.

[26]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[27]  K. Davies,et al.  Modulation of Lon protease activity and aconitase turnover during aging and oxidative stress , 2002, FEBS letters.

[28]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[29]  Xiang Chen,et al.  Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites , 2013, Bioinform..

[30]  Ralf Hoffmann,et al.  Identification of protein carbonylation sites by two-dimensional liquid chromatography in combination with MALDI- and ESI-MS. , 2011, Journal of proteomics.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Brigitte I Frohnert,et al.  Increased Adipose Protein Carbonylation in Human Obesity , 2011, Obesity.

[33]  G. Aldini,et al.  Mass spectrometric approaches for the identification and quantification of reactive carbonyl species protein adducts. , 2013, Journal of proteomics.