Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning

Membrane proteins play an important role in the life activities of organisms. Knowing membrane protein types provides clues for understanding the structure and function of proteins. Though various computational methods for predicting membrane protein types have been developed, the results still do not meet the expectations of researchers. We propose two deep learning models to process sequence information and evolutionary information, respectively. Both models obtained better results than traditional machine learning models. Furthermore, to improve the performance of the sequence information model, we also provide a new vector representation method to replace the one-hot encoding, whose overall success rate improved by 3.81% and 6.55% on two datasets. Finally, a more effective model is obtained by fusing the above two models, whose overall success rate reached 95.68% and 92.98% on two datasets. The final experimental results show that our method is more effective than existing methods for predicting membrane protein types, which can help laboratory researchers to identify the type of novel membrane proteins.

[1]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..

[2]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[3]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[4]  Yoshua Bengio,et al.  Deep convolutional networks for quality assessment of protein folds , 2018, Bioinform..

[5]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[6]  Sharmila Anishetty,et al.  Tripeptide analysis of protein structures , 2002, BMC Structural Biology.

[7]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  De-Shuang Huang,et al.  iEnhancer‐EL: identifying enhancers and their strength with ensemble learning approach , 2018, Bioinform..

[10]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[11]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[12]  Xiao-ming Hu,et al.  Geometry preserving projections algorithm for predicting membrane protein types. , 2010, Journal of theoretical biology.

[13]  Xing-Ming Zhao,et al.  DeepPhos: prediction of protein phosphorylation sites with deep learning , 2019, Bioinform..

[14]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[15]  Zhen Wang,et al.  SFAPS: An R package for structure/function analysis of protein sequences based on informational spectrum method , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[16]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[17]  Amarda Shehu,et al.  Deep learning improves antimicrobial peptide recognition , 2018, Bioinform..

[18]  Liang Kong,et al.  Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. , 2014, Journal of theoretical biology.

[19]  Lei Deng,et al.  Prediction of Protein S-Sulfenylation Sites Using a Deep Belief Network , 2018, Current Bioinformatics.

[20]  De-shuang Huang,et al.  An Algorithm for Finding Functional Modules and Protein Complexes in Protein-Protein Interaction Networks , 2008, Journal of biomedicine & biotechnology.

[21]  Abdollah Dehzangi,et al.  Protein Fold Recognition Using Genetic Algorithm Optimized Voting Scheme and Profile Bigram , 2016, J. Softw..

[22]  Lei S. Qi,et al.  DNMT3A and TET1 cooperate to regulate promoter epigenetic landscapes in mouse embryonic stem cells , 2018, Genome Biology.

[23]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[24]  Qi Wu,et al.  Enhanced prediction of RNA solvent accessibility with long short-term memory neural networks and improved sequence profiles , 2018, Bioinform..

[25]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[26]  Kyungsook Han,et al.  Predicting key long-range interaction sites by B-factors. , 2008, Protein and peptide letters.

[27]  De-Shuang Huang,et al.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites , 2018, Scientific Reports.

[28]  Maqsood Hayat,et al.  Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. , 2012, Analytical biochemistry.

[29]  Sun-Yuan Kung,et al.  Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins. , 2016, Journal of theoretical biology.

[30]  Xing-Ming Zhao,et al.  A novel approach to extracting features from motif content and protein composition for protein sequence classification , 2005, Neural Networks.

[31]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[32]  Robert Fredriksson,et al.  Mapping the human membrane proteome : a majority of the human membrane proteins can be classified according to function and evolutionary origin , 2015 .

[33]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[34]  Shunfang Wang,et al.  Protein subnuclear localization based on a new effective representation and intelligent kernel linear discriminant analysis by dichotomous greedy genetic algorithm , 2018, PloS one.

[35]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[36]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[37]  Loris Nanni,et al.  An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence , 2008, Amino Acids.

[38]  Zhu-Hong You,et al.  Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[40]  De-Shuang Huang,et al.  An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. , 2017, Molecular bioSystems.

[41]  Sun-Yuan Kung,et al.  Benchmark data for identifying multi-functional types of membrane proteins , 2016, Data in brief.

[42]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[43]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[44]  Zu-Guo Yu,et al.  A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC. , 2014, Journal of theoretical biology.

[45]  Yan Li,et al.  A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile. , 2014, Biochimie.

[46]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[47]  Pritish Kumar Varadwaj,et al.  DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool , 2017 .

[48]  Lei Guo,et al.  Prediction for Membrane Protein Types Based on Effective Fusion Representation and MIC-GA Feature Selection , 2018, IEEE Access.

[49]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[50]  Q. Zou,et al.  Deep learning in omics: a survey and guideline , 2018, Briefings in functional genomics.

[51]  Jijun Tang,et al.  Prediction of human protein subcellular localization using deep learning , 2017, J. Parallel Distributed Comput..

[52]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[53]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[54]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[55]  B. Wang,et al.  Inferring protein-protein interacting sites using residue conservation and evolutionary information. , 2006, Protein and peptide letters.

[56]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[57]  Lei Zhang,et al.  Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. , 2014, Current protein & peptide science.

[58]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[59]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[60]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[61]  Xuhui Chen,et al.  The prediction of membrane protein types with NPE , 2010, IEICE Electron. Express.

[62]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[63]  E Siva Sankari,et al.  Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. , 2017, Journal of theoretical biology.

[64]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[65]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .