Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition

Knowledge on protein folding has a profound impact on understanding the heterogeneity and molecular function of proteins, further facilitating drug design. Predicting the 3D structure (fold) of a protein is a key problem in molecular biology. Determination of the fold of a protein mainly relies on molecular experimental methods. With the development of next-generation sequencing techniques, the discovery of new protein sequences has been rapidly increasing. With such a great number of proteins, the use of experimental techniques to determine protein folding is extremely difficult because these techniques are time consuming and expensive. Thus, developing computational prediction methods that can automatically, rapidly, and accurately classify unknown protein sequences into specific fold categories is urgently needed. Computational recognition of protein folds has been a recent research hotspot in bioinformatics and computational biology. Many computational efforts have been made, generating a variety of computational prediction methods. In this review, we conduct a comprehensive survey of recent computational methods, especially machine learning-based methods, for protein fold recognition. This review is anticipated to assist researchers in their pursuit to systematically understand the computational recognition of protein folds.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[5]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[7]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[8]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[11]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[14]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[15]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[18]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[19]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[20]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Loris Nanni A novel ensemble of classifiers for protein fold recognition , 2006, Neurocomputing.

[22]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[25]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[26]  Yuehui Chen,et al.  Ensemble of Probabilistic Neural Networks for Protein Fold Recognition , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[27]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[28]  Vojislav Kecman,et al.  Adaptive local hyperplane classification , 2008, Neurocomputing.

[29]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[30]  Xieping Gao,et al.  A novel hierarchical ensemble classifier for protein fold recognition. , 2008, Protein engineering, design & selection : PEDS.

[31]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[32]  B. Rost,et al.  Critical assessment of methods of protein structure prediction—Round VIII , 2009, Proteins.

[33]  N.R. Pal,et al.  Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers , 2009, IEEE Transactions on NanoBioscience.

[34]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[35]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[36]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[37]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[38]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[39]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  Jinbo Xu,et al.  Raptorx: Exploiting structure information for protein alignment by statistical inference , 2011, Proteins.

[42]  Christoph Weber,et al.  FFAS server: novel features and applications , 2011, Nucleic Acids Res..

[43]  Chengqi Zhang,et al.  Margin-based ensemble classifier for protein fold recognition , 2011, Expert Syst. Appl..

[44]  Jianyi Yang,et al.  Improving taxonomy‐based protein fold recognition by using global and local features , 2011, Proteins.

[45]  Lusheng Wang,et al.  Protein-Protein Binding Sites Prediction by 3D Structural Similarities , 2011, J. Chem. Inf. Model..

[46]  Babak Nadjar Araabi,et al.  Evidence theoretic protein fold classification based on the concept of hyperfold. , 2012, Mathematical biosciences.

[47]  Katarzyna Stapor,et al.  A hybrid discriminative/generative approach to protein fold recognition , 2012, Neurocomputing.

[48]  Q Zou,et al.  Improved method for predicting protein fold patterns with ensemble classifiers. , 2012, Genetics and molecular research : GMR.

[49]  Lusheng Wang,et al.  Detecting Protein Conformational Changes in Interactions via Scaling Known Structures , 2013, J. Comput. Biol..

[50]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[51]  Dong Xu,et al.  Transmembrane Protein Alignment and Fold Recognition Based on Predicted Topology , 2013, PloS one.

[52]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[53]  Ke Chen,et al.  PFP-RFSM: Protein fold prediction by using random forests and sequence motifs , 2013 .

[54]  Dimitrios I. Fotiadis,et al.  Assessment of optimized Markov models in protein fold classification , 2014, J. Bioinform. Comput. Biol..

[55]  Alexey G. Murzin,et al.  SCOP2 prototype: a new approach to protein structure mining , 2014, Nucleic Acids Res..

[56]  Dong Xu,et al.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking , 2014, Bioinform..

[57]  Lusheng Wang,et al.  Probabilistic Models for Capturing More Physicochemical Properties on Protein-Protein Interface , 2014, J. Chem. Inf. Model..

[58]  Jianzhu Ma,et al.  RaptorX server: a resource for template-based protein structure modeling. , 2014, Methods in molecular biology.

[59]  Xiuzhen Hu,et al.  Recognition of 27-Class Protein Folds by Adding the Interaction of Segments and Motif Information , 2014, BioMed research international.

[60]  Zhaolei Zhang,et al.  SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences , 2014, Bioinform..

[61]  Alexandre G. de Brevern,et al.  Improving protein fold recognition with hybrid profiles combining sequence and structure evolution , 2015, Bioinform..

[62]  Zhaolei Zhang,et al.  Computational learning on specificity-determining residue-nucleotide interactions , 2015, Nucleic acids research.

[63]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[64]  Xing Gao,et al.  Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique , 2015, IEEE Transactions on NanoBioscience.

[65]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[66]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[67]  Jun Gao,et al.  ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier , 2016, BioMed research international.

[68]  M. Ashraf,et al.  The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements , 2015, Saudi journal of biological sciences.

[69]  Jijun Tang,et al.  Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information , 2017, Inf. Sci..