AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine

Antioxidant proteins play important roles in countering oxidative damage in organisms. Because it is time-consuming and has a high cost, the accurate identification of antioxidant proteins using biological experiments is a challenging task. For these reasons, we proposed a model using machine-learning algorithms that we named AOPs-SVM, which was developed based on sequence features and a support vector machine. Using a testing dataset, we conducted a jackknife cross-validation test with the proposed AOPs-SVM classifier and obtained 0.68 in sensitivity, 0.985 in specificity, 0.942 in average accuracy, 0.741 in MCC, and 0.832 in AUC. This outperformed existing classifiers. The experiment results demonstrate that the AOPs-SVM is an effective classifier and contributes to the research related to antioxidant proteins. A web server was built at http://server.malab.cn/AOPs-SVM/index.jsp to provide open access.

[1]  Xiaofeng Li,et al.  Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data , 2019, Briefings Bioinform..

[2]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[3]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[4]  Wei Chen,et al.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome , 2019, Bioinform..

[5]  Xing Gao,et al.  Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique , 2015, IEEE Transactions on NanoBioscience.

[6]  A. Panda,et al.  The avian embryo and its antioxidant defence system , 2014 .

[7]  Yen-Wenn Liu,et al.  Patatin, the tuber storage protein of potato (Solanum tuberosum L.), exhibits antioxidant activity in vitro. , 2003, Journal of agricultural and food chemistry.

[8]  J. Keaney,et al.  Antioxidants and atherosclerotic heart disease. , 1997, The New England journal of medicine.

[9]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[10]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[11]  Fereidoon Shahidi,et al.  Antioxidant activity and water-holding capacity of canola protein hydrolysates. , 2008, Food chemistry.

[12]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[13]  Ivor Mason,et al.  The Avian Embryo , 1999 .

[14]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[15]  B. Liu,et al.  Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods , 2017, Oncotarget.

[16]  P. Krishnaswamy,et al.  Inhibition of Fe(II) catalyzed linoleic acid oxidation and DNA damage by phosvitin , 1997, Molecular and Cellular Biochemistry.

[17]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[18]  Guangmin Liang,et al.  SeqSVM: A Sequence-Based Support Vector Machine Method for Identifying Antioxidant Proteins , 2018, International journal of molecular sciences.

[19]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[20]  Djamel Bouchaffra,et al.  Protein Fold Recognition using a Structural Hidden Markov Model , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[22]  Renzhi Cao,et al.  Survey of Machine Learning Techniques in Drug Discovery. , 2019, Current drug metabolism.

[23]  N. Bhaskar,et al.  In vitro antioxidant activity of liquor from fermented shrimp biowaste. , 2008, Bioresource technology.

[24]  V. Singh,et al.  Anti-oxidant and immunomodulatory properties of seabuckthorn (Hippophae rhamnoides)--an in vitro study. , 2002, Journal of ethnopharmacology.

[25]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[26]  J. German,et al.  Lactoferrin in infant formulas: effect on oxidation. , 2000, Journal of agricultural and food chemistry.

[27]  B. Ames,et al.  Oxidants, antioxidants, and the degenerative diseases of aging. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[29]  Yong Deng,et al.  Dependent Evidence Combination Based on Shearman Coefficient and Pearson Coefficient , 2018, IEEE Access.

[30]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[31]  Wei Chen,et al.  Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins. , 2019, Current drug metabolism.

[32]  D. Krishnaiah,et al.  Phytochemical antioxidants for health and medicine A move towards nature , 2007 .

[33]  H. Savaş,et al.  Changes in nitric oxide levels and antioxidant enzyme activities may have a role in the pathophysiological mechanisms involved in autism. , 2003, Clinica chimica acta; international journal of clinical chemistry.

[34]  Bin Liu,et al.  HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search , 2018, Briefings Bioinform..

[35]  Kyun Oh Lee,et al.  Isolation and characterization of antioxidant protein fractions from melinjo (Gnetum gnemon) seeds. , 2011, Journal of agricultural and food chemistry.

[36]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[37]  Cristian Robert Munteanu,et al.  Random Forest classification based on star graph topological indices for antioxidant proteins. , 2013, Journal of theoretical biology.

[38]  Jiu-Xin Tan,et al.  Evaluation of different computational methods on 5-methylcytosine sites identification , 2020, Briefings Bioinform..

[39]  Wei Chen,et al.  Identification of Antioxidants from Sequence Information Using Naïve Bayes , 2013, Comput. Math. Methods Medicine.

[40]  Miao Sun,et al.  AngularQA: Protein Model Quality Assessment with LSTM Networks , 2019 .

[41]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  A. Podsędek Natural antioxidants and antioxidant capacity of Brassica vegetables : A review , 2007 .

[43]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[44]  Feng Zhu,et al.  Simultaneous Improvement in the Precision, Accuracy, and Robustness of Label-free Proteome Quantification by Optimizing Data Manipulation Chains* , 2019, Molecular & Cellular Proteomics.

[45]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[46]  K. Gey The antioxidant hypothesis of cardiovascular disease: epidemiology and mechanisms. , 1990, Biochemical Society transactions.

[47]  HaiXia Long,et al.  Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins , 2017 .

[48]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[49]  Renzhi Cao,et al.  Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13 , 2019, bioRxiv.

[50]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..

[51]  T. Zima,et al.  Oxidative stress, metabolism of ethanol and alcohol-related diseases. , 2001, Journal of biomedical science.

[52]  Ran Su,et al.  M6APred-EL: A Sequence-Based Predictor for Identifying N6-methyladenosine Sites Using Ensemble Learning , 2018, Molecular therapy. Nucleic acids.

[53]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[54]  Q. Zou,et al.  Deep learning in omics: a survey and guideline , 2018, Briefings in functional genomics.

[55]  Zhangxin Chen,et al.  ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network , 2017, Molecules.

[56]  S. Narasimhan,et al.  Food Antioxidants: Sources and Methods of Evaluation , 1995 .

[57]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[58]  Yuehui Chen,et al.  Ensemble of Probabilistic Neural Networks for Protein Fold Recognition , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[59]  Feng Zhu,et al.  Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs , 2019, Briefings Bioinform..

[60]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[61]  Hao Lin,et al.  Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions , 2016, Interdisciplinary Sciences: Computational Life Sciences.

[62]  K. Iwami,et al.  Deamidation-induced fragmentation of maize zein, and its linked reduction in fatty acid-binding capacity as well as antioxidative effect , 1997 .

[63]  Yong Deng,et al.  Combination of Evidential Sensor Reports with Distance Function and Belief Entropy in Fault Diagnosis , 2019, Int. J. Comput. Commun. Control.

[64]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[65]  Jian Huang,et al.  A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization , 2019, Current Bioinformatics.

[66]  Feng Zhu,et al.  Clinical Success of Drug Targets Prospectively Predicted by In Silico Study. , 2017, Trends in pharmacological sciences.

[67]  B. Ames,et al.  Dietary carcinogens and anticarcinogens. Oxygen radicals and degenerative diseases. , 1983, Science.

[68]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[69]  M. Beal,et al.  Oxidative damage in Alzheimer's , 1996, Nature.

[70]  Wen-Yang Huang,et al.  Purification and characterization of an antioxidant protein from Ginkgo biloba seeds. , 2010 .

[71]  Bo Li,et al.  NOREVA: normalization and evaluation of MS-based metabolomics data , 2017, Nucleic Acids Res..

[72]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[73]  Feng Zhu,et al.  Discovery of the Consistently Well-Performed Analysis Chain for SWATH-MS Based Pharmacoproteomic Quantification , 2018, Front. Pharmacol..

[74]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[75]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[76]  Xiaofeng Li,et al.  ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies , 2019, Briefings Bioinform..

[77]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.

[78]  Jijun Tang,et al.  Identification of drug-side effect association via multiple information integration with centered kernel alignment , 2019, Neurocomputing.

[79]  Wei Wang,et al.  Purification and Identification of a Natural Antioxidant Protein from Fertilized Eggs , 2017, Korean journal for food science of animal resources.

[80]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[81]  I. Laakso,et al.  Chemical composition and in vitro antioxidative activity of a lemon balm (Melissa officinalis L.) extract , 2008 .

[82]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[83]  Chuan-Hsiao Han,et al.  Antioxidant activities of dioscorin, the storage protein of yam (Dioscorea batatas Decne) tuber. , 2001, Journal of agricultural and food chemistry.

[84]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[85]  M. Luyckx,et al.  Antioxidant properties of albumin: effect on oxidative metabolism of human neutrophil granulocytes. , 1999, Farmaco.

[86]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[87]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[88]  Jianyi Yang,et al.  Improving taxonomy‐based protein fold recognition by using global and local features , 2011, Proteins.

[89]  Shengli Zhang,et al.  High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. , 2011, Biochimie.

[90]  Yuzong Chen,et al.  What Contributes to Serotonin-Norepinephrine Reuptake Inhibitors' Dual-Targeting Mechanism? The Key Role of Transmembrane Domain 6 in Human Serotonin and Norepinephrine Transporters Revealed by Molecular Dynamics Simulation. , 2018, ACS chemical neuroscience.

[91]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..

[92]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[93]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[94]  Jiu-Xin Tan,et al.  Identification of hormone binding proteins based on machine learning methods. , 2019, Mathematical biosciences and engineering : MBE.