A Prediction Method of DNA-Binding Proteins Based on Evolutionary Information

DNA is the carrier of genetic information in organisms, and DNA-binding protein is one type of unwinding enzymes, which plays a key role in various biological molecular functions. That has greatly promoted the research of various methods for identifying DNA-binding proteins. In recent years, researchers have developed a Machine Learning-based method to predict DNA-binding proteins quickly and accurately. Although the prediction accuracy of current methods is considerable, the performance of their prediction can be further improved. In this paper, a DNA-binding proteins prediction model based on PSSM (Position Specific Scoring Matrix) features and Random Forest classifier is proposed. The results of experiments show that the proposed method can achieve great prediction performance on PDB1075 and PDB186 datasets, whose accuracy is 82.14% and 79.0%, respectively. Experiments show that the method can be compared with other methods, and even surpass the previous methods on some datasets.

[1]  J. Thornton,et al.  An overview of the structures of protein-DNA complexes , 2000, Genome Biology.

[2]  Yin Wang,et al.  RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences , 2016, International journal of molecular sciences.

[3]  Shinn-Ying Ho,et al.  Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties , 2011, BMC Bioinformatics.

[4]  Cheng Chen,et al.  β-Barrel Transmembrane Protein Predicting Using Support Vector Machine , 2017, ICIC.

[5]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[6]  Xiaolong Wang,et al.  Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation , 2015, BMC Systems Biology.

[7]  Jeffrey Skolnick,et al.  DBD-Hunter: a knowledge-based method for the prediction of DNA–protein interactions , 2008, Nucleic acids research.

[8]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[9]  Jeffrey Skolnick,et al.  Efficient prediction of nucleic acid binding function from low-resolution protein structures. , 2006, Journal of molecular biology.

[10]  Bin Liu,et al.  Identification of DNA-binding proteins by auto-cross covariance transformation , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[12]  Yaoqi Zhou,et al.  Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function , 2010, Bioinform..

[13]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Christina S. Leslie,et al.  iDBPs: a web server for the identification of DNA binding proteins , 2010, Bioinform..

[16]  Yael Mandel-Gutfreund,et al.  Annotating nucleic acid-binding function based on protein structure. , 2003, Journal of molecular biology.

[17]  P. N. Suganthan,et al.  DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest , 2009, Journal of biomolecular structure & dynamics.

[18]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[19]  B. Liu,et al.  DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation , 2015, Scientific Reports.

[20]  Loris Nanni,et al.  Wavelet images and Chou’s pseudo amino acid composition for protein classification , 2011, Amino Acids.

[21]  R. Mann,et al.  Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding , 2017, Nucleic acids research.

[22]  Janet M Thornton,et al.  Identifying DNA-binding proteins using structural motifs and the electrostatic potential. , 2004, Nucleic acids research.

[23]  Bo Jiang,et al.  Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes , 2014, PloS one.

[24]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[25]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[26]  K. Chou,et al.  iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model , 2011, PloS one.

[27]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[28]  Xiao Sun,et al.  Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature , 2008, Bioinform..

[29]  Cheng Chen,et al.  Optimizing HP Model Using Reinforcement Learning , 2018, ICIC.

[30]  Chuang Wu,et al.  Identify High-Quality Protein Structural Models by Enhanced K-Means , 2017, BioMed research international.

[31]  Rui Wu,et al.  Clinical and Pathological Variation of Charcot-Marie-Tooth 1A in a Large Chinese Cohort , 2017, BioMed research international.