Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier.

[1]  Loris Nanni,et al.  An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence , 2008, Amino Acids.

[2]  Kuo-Chen Chou,et al.  iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets , 2016, Molecules.

[3]  Michael S. Seaman,et al.  Structural basis for membrane anchoring of HIV-1 envelope spike , 2016, Science.

[4]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[5]  Kuo-Chen Chou,et al.  iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. , 2016, Analytical biochemistry.

[6]  Howard Leung,et al.  Prediction of membrane protein types from sequences and position-specific scoring matrices. , 2007, Journal of theoretical biology.

[7]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[8]  Sun-Yuan Kung,et al.  Benchmark data for identifying multi-functional types of membrane proteins , 2016, Data in brief.

[9]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[10]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[11]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[12]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[13]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[14]  J. Chou,et al.  Structure and mechanism of the M2 proton channel of influenza A virus , 2008, Nature.

[15]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[16]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[17]  Zhi-Ping Feng,et al.  Using amino acid and peptide composition to predict membrane protein types. , 2007, Biochemical and biophysical research communications.

[18]  Jingqi Yuan,et al.  A Multilabel Model Based on Chou’s Pseudo–Amino Acid Composition for Identifying Membrane Proteins with Both Single and Multiple Functional Types , 2013, The Journal of Membrane Biology.

[19]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[20]  J. Chou,et al.  Unusual architecture of the p7 channel from hepatitis C virus , 2013, Nature.

[21]  Sher Afzal Khan,et al.  A Prediction Model for Membrane Proteins Using Moments Based Features , 2016, BioMed research international.

[22]  Lukasz A. Kurgan,et al.  Classification of Cell Membrane Proteins , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[23]  Maqsood Hayat,et al.  Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. , 2012, Analytical biochemistry.

[24]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[25]  Sun-Yuan Kung,et al.  Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins. , 2016, Journal of theoretical biology.

[26]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[27]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[28]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[29]  Sun-Yuan Kung,et al.  Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Kuo-Chen Chou,et al.  Insights from Modeling the 3D Structure of DNA−CBF3b Complex , 2005 .

[31]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[32]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[33]  Zu-Guo Yu,et al.  A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC. , 2014, Journal of theoretical biology.

[34]  Kuo-Chen Chou Insights from modeling three-dimensional structures of the human potassium and sodium channels. , 2004, Journal of proteome research.

[35]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[36]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[37]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[38]  Kuo-Chen Chou,et al.  Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[39]  Kuo-Chen Chou,et al.  pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC , 2016, Bioinform..

[40]  Jian-Ding Qiu,et al.  Prediction of the Types of Membrane Proteins Based on Discrete Wavelet Transform and Support Vector Machines , 2010, The protein journal.

[41]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[42]  K. Chou,et al.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties. , 2016, Analytical biochemistry.

[43]  Bin Liu,et al.  Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences , 2017 .

[44]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[45]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[46]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[47]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[48]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[49]  Nazar Zaki,et al.  Predicting Membrane Proteins Type Using Inter-domain Linker Knowledge , 2010, BIOCOMP.

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[51]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[52]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[53]  Yixue Li,et al.  Prediction of membrane protein types in a hybrid space. , 2008, Journal of proteome research.

[54]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[55]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[56]  Kuo-Chen Chou,et al.  Prediction of the Tertiary Structure of the β-Secretase Zymogen☆ , 2002 .

[57]  Kuo-Chen Chou,et al.  Insights from modeling the tertiary structure of human BACE2. , 2004, Journal of proteome research.

[58]  Lukasz Kurgan,et al.  Amino Acid Sequence Based Method for Prediction of Cell Membrane Protein Types , 2008 .

[59]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[60]  Parviz Abdolmaleki,et al.  Prediction of membrane protein types by means of wavelet analysis and cascaded neural networks. , 2008, Journal of theoretical biology.

[61]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[62]  Jia He,et al.  Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. , 2010, Analytical biochemistry.

[63]  Liangliang Kong,et al.  Architecture of the Mitochondrial Calcium Uniporter , 2016, Nature.

[64]  Chao Wang,et al.  ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition , 2012, Comput. Biol. Medicine.

[65]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[66]  Kuo-Chen Chou,et al.  iATC‐mISF: a multi‐label classifier for predicting the classes of anatomical therapeutic chemicals , 2016, Bioinform..

[67]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to identify membrane proteins and their types. , 2005, Biochemical and biophysical research communications.

[68]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[69]  Xuhui Chen,et al.  The prediction of membrane protein types with NPE , 2010, IEICE Electron. Express.

[70]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[71]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[72]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[73]  Xiao-ming Hu,et al.  Geometry preserving projections algorithm for predicting membrane protein types. , 2010, Journal of theoretical biology.

[74]  Xuan Xiao,et al.  iMem-Seq: A Multi-label Learning Classifier for Predicting Membrane Proteins Types , 2015, The Journal of Membrane Biology.

[75]  Samad Jahandideh,et al.  Application of density similarities to predict membrane protein types based on pseudo-amino acid composition. , 2011, Journal of theoretical biology.

[76]  Xiang Cheng,et al.  iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach , 2015, Journal of biomolecular structure & dynamics.

[77]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[78]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[79]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[80]  S. Harrison,et al.  Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching , 2011, Nature.

[81]  C. Zhang,et al.  Prediction of Membrane Protein Types Based on the Hydrophobic Index of Amino Acids , 2000, Journal of protein chemistry.