Predicting ATP-Binding Cassette Transporters Using the Random Forest Method

ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.

[1]  Quan Zou,et al.  O‐GlcNAcPRED‐II: an integrated classification algorithm for identifying O‐GlcNAcylation sites based on fuzzy undersampling and a K‐means PCA oversampling technique , 2018, Bioinform..

[2]  Jin Zhao,et al.  Drug repositioning based on triangularly balanced structure for tissue-specific diseases in incomplete interactome , 2017, Artif. Intell. Medicine.

[3]  Kai Li,et al.  iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features , 2019, Molecular therapy. Nucleic acids.

[4]  Youngsook Lee,et al.  Plant ABC Transporters Enable Many Unique Aspects of a Terrestrial Plant's Lifestyle. , 2016, Molecular plant.

[5]  Alfonso Rodríguez-Patón,et al.  Meta-Path Methods for Prioritizing Candidate Disease miRNAs , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Q Zou,et al.  Improved method for predicting protein fold patterns with ensemble classifiers. , 2012, Genetics and molecular research : GMR.

[7]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  V Radhika,et al.  Computational approaches for the classification of seed storage proteins , 2015, Journal of Food Science and Technology.

[9]  K. Beis Structural basis for the mechanism of ABC transporters. , 2015, Biochemical Society transactions.

[10]  Jue Chen,et al.  Structure, Function, and Evolution of Bacterial ATP-Binding Cassette Systems , 2008, Microbiology and Molecular Biology Reviews.

[11]  Rui Sun,et al.  RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition , 2019, Molecular therapy. Nucleic acids.

[12]  Xiangxiang Zeng,et al.  Deep Collaborative Filtering for Prediction of Disease Genes , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  K. Locher Structure and mechanism of ATP-binding cassette transporters , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Hui Ding,et al.  A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features , 2019, Front. Bioeng. Biotechnol..

[15]  JiRongrong,et al.  Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set , 2014 .

[16]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[17]  Zhigang Zeng,et al.  Sparse fully convolutional network for face labeling , 2019, Neurocomputing.

[18]  Feng Huang,et al.  A Fast Linear Neighborhood Similarity-Based Network Link Inference Method to Predict MicroRNA-Disease Associations , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Zhigang Zeng,et al.  CLU-CNNs: Object detection for medical images , 2019, Neurocomputing.

[20]  P. Biggin,et al.  Towards understanding promiscuity in multidrug efflux pumps. , 2014, Trends in biochemical sciences.

[21]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[22]  A. García‐Gasca,et al.  Genome-wide identification of ABC transporters in monogeneans. , 2019, Molecular and biochemical parasitology.

[23]  Lin Gao,et al.  Predicting Potential Drugs for Breast Cancer based on miRNA and Tissue Specificity , 2018, International journal of biological sciences.

[24]  T. Silhavy,et al.  Identification of two inner-membrane proteins required for the transport of lipopolysaccharide to the outer membrane of Escherichia coli , 2008, Proceedings of the National Academy of Sciences.

[25]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[26]  Xiangxiang Zeng,et al.  Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods , 2020, Briefings Bioinform..

[27]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[28]  Q. Zou,et al.  A novel machine learning method for cytokine-receptor interaction prediction. , 2016, Combinatorial chemistry & high throughput screening.

[29]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[30]  Yong Deng,et al.  Evidential Decision Tree Based on Belief Entropy , 2019, Entropy.

[31]  B. Liu,et al.  An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier , 2013, BioMed research international.

[32]  Jiangning Song,et al.  ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides , 2018, Bioinform..

[33]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[34]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[35]  Wen Zhang,et al.  The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions , 2018, Neurocomputing.

[36]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[37]  Jack Cao,et al.  A naive Bayes model to predict coupling between seven transmembrane domain receptors, and G-proteins , 2003, Bioinform..

[38]  Xiaolong Wang,et al.  Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation , 2015, BMC Systems Biology.

[39]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[40]  Yi Jiang,et al.  BinMemPredict: a Web Server and Software for Predicting Membrane Protein Types , 2013 .

[41]  Gaotao Shi,et al.  Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Yuchong Gong,et al.  A network embedding-based multiple information integration method for the MiRNA-disease association prediction , 2019, BMC Bioinformatics.

[43]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[44]  Xiangrong Liu,et al.  Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism , 2019, Bioinform..

[45]  Jijun Tang,et al.  Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. , 2019, Journal of theoretical biology.

[46]  Bin Liu,et al.  HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search , 2018, Briefings Bioinform..

[47]  A. Davidson,et al.  ABC solute importers in bacteria. , 2011, Essays in biochemistry.

[48]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[49]  D. Shibata,et al.  Genome-wide analysis of ATP binding cassette (ABC) transporters in tomato , 2018, PloS one.

[50]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[51]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[52]  D. Rees,et al.  Structural Basis of Trans-Inhibition in a Molybdate / Tungstate ABC Transporter , 2008 .

[53]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[54]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[55]  Jijun Tang,et al.  Predicting protein-protein interactions via multivariate mutual information of protein sequences , 2016, BMC Bioinformatics.

[56]  A. Rzhetsky,et al.  The human ATP-binding cassette (ABC) transporter superfamily. , 2001, Genome research.

[57]  K. Locher Mechanistic diversity in ATP-binding cassette (ABC) transporters , 2016, Nature Structural &Molecular Biology.

[58]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[59]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[60]  P. Leprohon,et al.  ABC transporters involved in drug resistance in human parasites. , 2011, Essays in biochemistry.

[61]  Dong-Qing Wei,et al.  Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method , 2019, J. Chem. Inf. Model..

[62]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[63]  Feng Huang,et al.  SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions , 2018, PLoS Comput. Biol..

[64]  Hampapathalu A. Nagarajaram,et al.  Svm-Based Method for protein Structural Class Prediction Using Secondary Structural Content and Structural Information of amino acids , 2011, J. Bioinform. Comput. Biol..

[65]  Yanlin Chen,et al.  SFLLN: A sparse feature learning ensemble method with linear neighborhood regularization for predicting drug-drug interactions , 2019, Inf. Sci..

[66]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[67]  Wei Lin,et al.  A comprehensive overview and evaluation of circular RNA detection tools , 2017, PLoS Comput. Biol..

[68]  Richard S. P. Horler,et al.  The substrate-binding protein in bacterial ABC transporters: dissecting roles in the evolution of substrate specificity. , 2015, Biochemical Society transactions.

[69]  Geoffrey I. Webb,et al.  MetalExplorer, a Bioinformatics Tool for the Improved Prediction of Eight Types of Metal-Binding Sites Using a Random Forest Algorithm with Two- Step Feature Selection , 2017 .

[70]  Fei Guo,et al.  AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine , 2019, Front. Bioeng. Biotechnol..

[71]  Jijun Tang,et al.  Improved detection of DNA-binding proteins via compression technology on PSSM information , 2017, PloS one.

[72]  Jijun Tang,et al.  PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only , 2017, IEEE Transactions on NanoBioscience.

[73]  E. Pardon,et al.  Structures of P-glycoprotein reveal its conformational flexibility and an epitope on the nucleotide-binding domain , 2013, Proceedings of the National Academy of Sciences.

[74]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[75]  Jiangning Song,et al.  MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters , 2019, Bioinform..

[76]  Markus A Seeger,et al.  Molecular basis of multidrug transport by ABC transporters. , 2009, Biochimica et biophysica acta.

[77]  Gaotao Shi,et al.  CPPred-RF: A Sequence-based Predictor for Identifying Cell-Penetrating Peptides and Their Uptake Efficiency. , 2017, Journal of proteome research.

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  Weiwei Liu,et al.  Multi-Label Image Classification by Feature Attention Network , 2019, IEEE Access.

[80]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[81]  Yi Xiong,et al.  PseUI: Pseudouridine sites identification based on RNA sequence information , 2018, BMC Bioinformatics.

[82]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[83]  Lin Gao,et al.  Inferring drug-disease associations based on known protein complexes , 2015, BMC Medical Genomics.

[84]  Q. Xia,et al.  Cloning and characterization of a novel Nicotiana tabacum ABC transporter involved in shoot branching. , 2015, Physiologia plantarum.

[85]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[86]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[87]  D. Baillie,et al.  The ABC transporter gene family of Caenorhabditis elegans has implications for the evolutionary dynamics of multidrug resistance in eukaryotes , 2004, Genome Biology.

[88]  Yi Xiong,et al.  Protein-protein interface hot spots prediction based on a hybrid feature selection strategy , 2018, BMC Bioinformatics.

[89]  Juan Feng,et al.  Identification of Antioxidant Proteins With Deep Learning From Sequence Information , 2018, Front. Pharmacol..

[90]  D. Rees,et al.  The High-Affinity E. coli Methionine ABC Transporter: Structure and Allosteric Regulation , 2008, Science.

[91]  Wei Tao,et al.  A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. , 2019, Briefings in functional genomics.