Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins

Oxidative stress can damage major cell components, including protein, DNA, lipid and cell membranes, which may make cells lose function and induce a wide variety of diseases. As an extensive kind of antioxidants in human and animals, antioxidant proteins are essential to eliminate cell damage and aging problems caused by oxidative stress. Accurate identification of antioxidant proteins is a significant step to reveal the inducement and physiological process of certain types of diseases and aging. Furthermore, newly identified antioxidant proteins may provide candidate targets for curing or alleviating diseases and slowing down the aging process. In this study, a random forest-based approach incorporating PSSM (Position Specific Scoring Matrix) and g-gap dipeptide composition is put forward to distinguish antioxidant proteins from non-antioxidant proteins. To further improve the prediction performance, the information gain combined with incremental feature selection is adopted to obtain optimal features. Compared with prior studies in testing dataset, the proposed method shows excellent predictive performance with accuracy of 0.807, MCC of 0.543, AUC of 0.939, respectively. It is indicated that this method may be an alternative perspective predictor for annotating antioxidant proteins.

[1]  B. Ames,et al.  Oxidants, antioxidants, and the degenerative diseases of aging. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[3]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[4]  Eric A. Decker,et al.  Antioxidant Activity of Proteins and Peptides , 2008, Critical reviews in food science and nutrition.

[5]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[6]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[7]  Amit Kunwar,et al.  Free radicals, oxidative stress and importance of antioxidants in human health - , 2011 .

[8]  Jianan Wang,et al.  The Prediction of Calpain Cleavage Sites with the mRMR and IFS Approaches , 2013 .

[9]  P. Patil,et al.  Use of natural antioxidants to scavenge free radicals: a major cause of diseases. , 2010 .

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  S. S.,et al.  Free Radicals , 1933, Nature.

[12]  C. Block,et al.  Mechanisms linking obesity with cardiovascular disease , 2006, Nature.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Wei Chen,et al.  Identification of Antioxidants from Sequence Information Using Naïve Bayes , 2013, Comput. Math. Methods Medicine.

[15]  B. Halliwell,et al.  Free radicals, antioxidants, and human disease: curiosity, cause, or consequence? , 1994, The Lancet.

[16]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[17]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[18]  N. Holbrook,et al.  Oxidants, oxidative stress and the biology of ageing , 2000, Nature.

[19]  Shengli Zhang,et al.  Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity. , 2014, Journal of theoretical biology.

[20]  Yuji Naito,et al.  What Is Oxidative Stress , 2002 .

[21]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[22]  Yu-Dong Cai,et al.  Prediction of carbamylated lysine sites based on the one-class k-nearest neighbor method. , 2013, Molecular bioSystems.

[23]  M. Michael Gromiha,et al.  A simple statistical method for discriminating outer membrane proteins with better accuracy , 2005, Bioinform..

[24]  G. Davı̀,et al.  Lipid peroxidation in diabetes mellitus. , 2005, Antioxidants & redox signaling.

[25]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[26]  M. Valko,et al.  Free radicals, metals and antioxidants in oxidative stress-induced cancer. , 2006, Chemico-biological interactions.

[27]  Xuan Xiao,et al.  NRPred-FS: A Feature Selection based Two-level Predictor for NuclearReceptors , 2014 .

[28]  M. Tania,et al.  Antioxidant enzymes and cancer , 2010 .