DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning

DNA-binding proteins (DBPs) perform an influential role in diverse biological activities like DNA replication, slicing, repair, and transcription. Some DBPs are indispensable for understanding many types of human cancers (i.e. lung, breast, and liver cancer) and chronic diseases (i.e. AIDS/HIV, asthma), while other kinds are involved in antibiotics, steroids, and anti-inflammatory drugs designing. These crucial processes are closely related to DBPs types. DBPs are categorized into single-stranded DNA-binding proteins (ssDBPs) and double-stranded DNA-binding proteins (dsDBPs). Few computational predictors have been reported for discriminating ssDBPs and dsDBPs. However, due to the limitations of the existing methods, an intelligent computational system is still highly desirable. In this work, features from protein sequences are discovered by extending the notion of dipeptide composition (DPC), evolutionary difference formula (EDF), and K-separated bigram (KSB) into the position-specific scoring matrix (PSSM). The highly intrinsic information was encoded by a compression approach named discrete cosine transform (DCT) and the model was trained with support vector machine (SVM). The prediction performance was further boosted by the genetic algorithm (GA) ensemble strategy. The novel predictor (DBP-GAPred) acquired 1.89%, 0.28%, and 6.63% higher accuracies on jackknife, 10-fold, and independent dataset tests, respectively than the best predictor. These outcomes confirm the superiority of our method over the existing predictors.

[1]  Hsien-Da Huang,et al.  Incorporating Evolutionary Information and Functional Domains for Identifying RNA Splicing Factors in Humans , 2011, PloS one.

[2]  Muhammad Arif,et al.  Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. , 2020, Genomics.

[3]  Muhammad Arif,et al.  SDBP-Pred: Prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM. , 2019, Analytical biochemistry.

[4]  Jijun Tang,et al.  Improved detection of DNA-binding proteins via compression technology on PSSM information , 2017, PloS one.

[5]  Muhammad Arif,et al.  Prediction of membrane protein types by exploring local discriminative information from evolutionary profiles. , 2019, Analytical biochemistry.

[6]  Jürgen Bajorath,et al.  Data structures for computational compound promiscuity analysis and exemplary applications to inhibitors of the human kinome , 2019, Journal of Computer-Aided Molecular Design.

[7]  Salman Khan,et al.  iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach , 2020 .

[8]  Zaheer Ullah Khan,et al.  DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space , 2018, Chemometrics and Intelligent Laboratory Systems.

[9]  Liang Kong,et al.  Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[10]  Farman Ali,et al.  DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information , 2019, J. Comput. Aided Mol. Des..

[11]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[12]  Lin Sun,et al.  Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences , 2017, BMC Bioinformatics.

[13]  S. Sudha,et al.  Prediction of Protein Tertiary Structure Using Genetic Algorithm , 2012, Soft Computing Techniques in Vision Science.

[14]  Gajendra P. S. Raghava,et al.  Identification of DNA-binding proteins using support vector machines and evolutionary profiles , 2007, BMC Bioinformatics.

[15]  Dechang Pi,et al.  iRSpot-SPI: Deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou's 5-step rule and pseudo components , 2019, Chemometrics and Intelligent Laboratory Systems.

[16]  P. N. Suganthan,et al.  DNA-Prot: Identification of DNA Binding Proteins from Protein Sequence Information using Random Forest , 2009, Journal of biomolecular structure & dynamics.

[17]  Abdollah Dehzangi,et al.  Protein Fold Recognition Using Genetic Algorithm Optimized Voting Scheme and Profile Bigram , 2016, J. Softw..

[18]  M. Mildner,et al.  Re-epithelialization and immune cell behaviour in an ex vivo human skin model , 2020, Scientific Reports.

[19]  Van-Nui Nguyen,et al.  SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data , 2019, PeerJ Comput. Sci..

[20]  Saeed Ahmad,et al.  Improving prediction of extracellular matrix proteins using evolutionary information via a grey system model and asymmetric under-sampling technique , 2018 .

[21]  Yanxin Huang,et al.  Prediction of Bioluminescent Proteins Using Auto Covariance Transformation of Evolutional Profiles , 2012, International journal of molecular sciences.

[22]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[23]  Lei Deng,et al.  PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction , 2019, Molecules.

[24]  Salman Khan,et al.  Deep-AntiFP: Prediction of antifungal peptides using distanct multi-informative features incorporating with deep neural networks , 2021 .

[25]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[26]  Maqsood Hayat,et al.  Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. , 2016, Journal of theoretical biology.

[27]  Cheng Chen,et al.  RBPro-RF: Use Chou’s 5-steps rule to predict RNA-binding proteins via random forest with elastic net , 2020 .

[28]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[29]  Sher Afzal Khan,et al.  iNuc-ext-PseTNC: an efficient ensemble model for identification of nucleosome positioning by extending the concept of Chou’s PseAAC to pseudo-tri-nucleotide composition , 2018, Molecular Genetics and Genomics.

[30]  Yves Moreau,et al.  Ultra-fast global homology detection with Discrete Cosine Transform and Dynamic Time Warping , 2018, Bioinform..