Fold-LTR-TCP: protein fold recognition based on triadic closure principle

As an important task in protein structure and function studies, protein fold recognition has attracted more and more attention. The existing computational predictors in this field treat this task as a multi-classification problem, ignoring the relationship among proteins in the dataset. However, previous studies showed that their relationship is critical for protein homology analysis. In this study, the protein fold recognition is treated as an information retrieval task. The Learning to Rank model (LTR) was employed to retrieve the query protein against the template proteins to find the template proteins in the same fold with the query protein in a supervised manner. The triadic closure principle (TCP) was performed on the ranking list generated by the LTR to improve its accuracy by considering the relationship among the query protein and the template proteins in the ranking list. Finally, a predictor called Fold-LTR-TCP was proposed. The rigorous test on the LE benchmark dataset showed that the Fold-LTR-TCP predictor achieved an accuracy of 73.2%, outperforming all the other competing methods.

[1]  Hang Wei,et al.  iCircDA-MF: identification of circRNA-disease associations based on matrix factorization , 2019, Briefings Bioinform..

[2]  John M. Carroll,et al.  HBLAST: Parallelised sequence similarity - A Hadoop MapReducable basic local alignment search tool , 2015, J. Biomed. Informatics.

[3]  Q. Zou,et al.  Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition , 2016, International journal of molecular sciences.

[4]  Bin Liu,et al.  Protein fold recognition based on sparse representation based classification , 2017, Artif. Intell. Medicine.

[5]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[6]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[7]  Guohua Huang,et al.  The Advances and Challenges of Deep Learning Application in Biological Big Data Processing , 2017, Current Bioinformatics.

[8]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[9]  Dong Xu,et al.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking , 2014, Bioinform..

[10]  Jason Weston,et al.  Protein Ranking by Semi-Supervised Network Propagation , 2006, BMC Bioinformatics.

[11]  Andrew Trotman,et al.  Learning to Rank , 2005, Information Retrieval.

[12]  Xiaozhao Fang,et al.  Protein fold recognition based on multi-view modeling , 2019, Bioinform..

[13]  Xiaolong Wang,et al.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis , 2008, BMC Bioinformatics.

[14]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[15]  Taeho Jo,et al.  Improving protein fold recognition by random forest , 2014, BMC Bioinformatics.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Q. Zou,et al.  Deep learning in omics: a survey and guideline , 2018, Briefings in functional genomics.

[18]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[19]  Zümray Dokur,et al.  Protein fold classification with Grow-and-Learn network , 2017 .

[20]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[21]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[22]  Debora S. Marks,et al.  Solutions to the computational protein folding problem , 2018 .

[23]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[24]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[25]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[26]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[27]  Wen-Tsao Pan,et al.  Sentiment classification of micro‐blog comments based on Randomforest algorithm , 2019, Concurr. Comput. Pract. Exp..

[28]  Junjie Chen,et al.  Application of learning to rank to protein remote homology detection , 2015, Bioinform..

[29]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[30]  Jian Peng,et al.  Boosting Protein Threading Accuracy , 2009, RECOMB.

[31]  Zhibin Lv,et al.  Protein Function Prediction: From Traditional Classifier to Deep Learning , 2019, Proteomics.

[32]  Chao Wang,et al.  Improving protein fold recognition by extracting fold-specific features from predicted residue–residue contacts , 2017, Bioinform..

[33]  Ren Long,et al.  dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation , 2016, Scientific Reports.

[34]  Hongbo Mu,et al.  An ensemble approach to protein fold classification by integration of template‐based assignment and support vector machine classifier , 2016, Bioinform..

[35]  Karol Myszkowski,et al.  Adaptive Logarithmic Mapping For Displaying High Contrast Scenes , 2003, Comput. Graph. Forum.

[36]  Bin Liu,et al.  ProtDet-CCH: Protein Remote Homology Detection by Combining Long Short-Term Memory and Ranking Methods , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Junjie Chen,et al.  Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[40]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[41]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[42]  Hongyi Zhou,et al.  Single‐body residue‐level knowledge‐based energy score combined with sequence‐profile and secondary structure information for fold recognition , 2004, Proteins.

[43]  Bin Liu,et al.  ProtDec-LTR3.0: Protein Remote Homology Detection by Incorporating Profile-Based Features Into Learning to Rank , 2019, IEEE Access.

[44]  Junjie Chen,et al.  ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank , 2017, Bioinform..

[45]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[46]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[47]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[48]  Qinghua Hu,et al.  HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy , 2015, Bioinform..

[49]  Bin Liu,et al.  HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search , 2018, Briefings Bioinform..

[50]  Quan Zou,et al.  HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing , 2017, Algorithms for Molecular Biology.

[51]  Q. Zou,et al.  Similarity computation strategies in the microRNA-disease network: a survey. , 2015, Briefings in functional genomics.

[52]  Albert-László Barabási,et al.  Network-based prediction of protein interactions , 2018, Nature Communications.

[53]  Bin Liu,et al.  MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks , 2019, Briefings Bioinform..

[54]  N. Østgaard,et al.  Meter‐scale spark X‐ray spectrum statistics , 2015, Journal of geophysical research. Atmospheres : JGR.

[55]  B. Liu,et al.  Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis , 2015, Molecular Genetics and Genomics.

[56]  Qing Liao,et al.  A comprehensive review and evaluation of computational methods for identifying protein complexes from protein-protein interaction networks , 2019, Briefings Bioinform..

[57]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[58]  Wei Zhang,et al.  SP5: Improving Protein Fold Recognition by Using Torsion Angle Profiles and Profile-Based Gap Penalty Model , 2008, PloS one.

[59]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[60]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[61]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[62]  Song Liu,et al.  Fold recognition by concurrent use of solvent accessibility and residue depth , 2007, Proteins.

[63]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[64]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..