Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information

The knowledge of DNA-binding proteins would help to understand the functions of proteins better in cellular biological processes. Research on the prediction of DNA-binding proteins can promote the research of drug proteins and computer acidified drugs. In recent years, methods based on machine learning are usually used to predict proteins. Although great predicted performance can be achieved via current methods, researchers still need to invest more research in terms of the improvement of predicted performance. In this study, the prediction of DNA-binding proteins is studied from the perspective of evolutionary information and the support vector machine method. One machine learning model for predicting DNA-binding proteins based on evolutionary features by using Chou's 5-step rule is put forward. The results show that great predicted performance is obtained on benchmark dataset PDB1075 and independent dataset PDB186, achieving the accuracy of 86.05% and 75.30%, respectively. Thus, the method proposed is comparable to a certain degree, and it may work even better than other methods to some extent.

[1]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Engelbert Buxbaum,et al.  Protein Secondary Structure , 2011 .

[4]  Marco Biasini,et al.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information , 2014, Nucleic Acids Res..

[5]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[6]  Michael Y. Galperin,et al.  Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) , 2000, Genome Biology.

[7]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[8]  Shinn-Ying Ho,et al.  Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties , 2011, BMC Bioinformatics.

[9]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[10]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[11]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[12]  Bin Liu,et al.  Identification of DNA-binding proteins by auto-cross covariance transformation , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[14]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[15]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[16]  Yin Wang,et al.  RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences , 2016, International journal of molecular sciences.

[17]  Cheng Chen,et al.  β-Barrel Transmembrane Protein Predicting Using Support Vector Machine , 2017, ICIC.

[18]  Qian-Zhong Li,et al.  Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure. , 2012, Journal of theoretical biology.

[19]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[20]  K. Chou,et al.  iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model , 2011, PloS one.

[21]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[22]  B. Liu,et al.  DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation , 2015, Scientific Reports.

[23]  Loris Nanni,et al.  Wavelet images and Chou’s pseudo amino acid composition for protein classification , 2011, Amino Acids.

[24]  Quan Zou,et al.  A Review of DNA-binding Proteins Prediction Methods , 2019, Current Bioinformatics.

[25]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[26]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[27]  Xiaolong Wang,et al.  Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation , 2015, BMC Systems Biology.

[28]  Quan Zou Editorial (Thematic Issue: Machine Learning Techniques for Protein Structure, Genomics Function Analysis and Disease Prediction) , 2016 .

[29]  P. N. Suganthan,et al.  DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest , 2009, Journal of biomolecular structure & dynamics.

[30]  Loris Nanni,et al.  An Empirical Study of Different Approaches for Protein Classification , 2014, TheScientificWorldJournal.

[31]  Xinghao Yu,et al.  Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information , 2018, bioRxiv.

[32]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[33]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[34]  Zhaolei Zhang,et al.  Computational learning on specificity-determining residue-nucleotide interactions , 2015, Nucleic acids research.