DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks

Protein fold recognition is critical for studying the structures and functions of proteins. The existing protein fold recognition approaches failed to efficiently calculate the pairwise sequence similarity scores of the proteins in the same fold sharing low sequence similarities. Furthermore, the existing feature vectorization strategies are not able to measure the global relationships among proteins from different protein folds. In this article, we proposed a new computational predictor called DeepSVM-fold for protein fold recognition by introducing a new feature vector based on the pairwise sequence similarity scores calculated from the fold-specific features extracted by deep learning networks. The feature vectors are then fed into a support vector machine to construct the predictor. Experimental results on the benchmark dataset (LE) show that DeepSVM-fold obviously outperforms all the other competing methods.

[1]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[2]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[3]  B. Liu,et al.  Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods , 2017, Oncotarget.

[4]  Chuan He,et al.  RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation , 2015, Genes & development.

[5]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[6]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[7]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Bin Liu,et al.  ProtDet-CCH: Protein Remote Homology Detection by Combining Long Short-Term Memory and Ranking Methods , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Hongbo Mu,et al.  An ensemble approach to protein fold classification by integration of template‐based assignment and support vector machine classifier , 2016, Bioinform..

[11]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[12]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[13]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[14]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[15]  Dong Xu,et al.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking , 2014, Bioinform..

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Bin Liu,et al.  ProtDec-LTR3.0: Protein Remote Homology Detection by Incorporating Profile-Based Features Into Learning to Rank , 2019, IEEE Access.

[18]  Xiaolong Wang,et al.  Exploiting three kinds of interface propensities to identify protein binding sites , 2009, Comput. Biol. Chem..

[19]  Junjie Chen,et al.  ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank , 2017, Bioinform..

[20]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[21]  Junjie Chen,et al.  A comprehensive review and comparison of different computational methods for protein remote homology detection , 2018, Briefings Bioinform..

[22]  Chao Wang,et al.  Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts , 2017, Bioinform..

[23]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[24]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[25]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[26]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[27]  Song Liu,et al.  Fold recognition by concurrent use of solvent accessibility and residue depth , 2007, Proteins.

[28]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[29]  Junjie Chen,et al.  Protein remote homology detection based on bidirectional long short-term memory , 2017, BMC Bioinformatics.

[30]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[31]  Xiaozhao Fang,et al.  Protein fold recognition based on multi-view modeling , 2019, Bioinform..

[32]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[33]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[34]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[35]  Jian Peng,et al.  Boosting Protein Threading Accuracy , 2009, RECOMB.

[36]  Xiaolong Wang,et al.  A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction , 2019, Briefings Bioinform..

[37]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[38]  Taeho Jo,et al.  Improving protein fold recognition by random forest , 2014, BMC Bioinformatics.

[39]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[40]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Ren Long,et al.  dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation , 2016, Scientific Reports.

[43]  Jianyi Yang,et al.  Improving taxonomy‐based protein fold recognition by using global and local features , 2011, Proteins.

[44]  Xiaolong Wang,et al.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis , 2008, BMC Bioinformatics.

[45]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[46]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[47]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[48]  Bin Liu,et al.  Protein fold recognition based on sparse representation based classification , 2017, Artif. Intell. Medicine.

[49]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.