Enhancing protein backbone angle prediction by using simpler models of deep neural networks

Protein structure prediction is a grand challenge. Prediction of protein structures via the representations using backbone dihedral angles has recently achieved significant progress along with the on-going surge of deep neural network (DNN) research in general. However, we observe that in the protein backbone angle prediction research, there is an overall trend to employ more and more complex neural networks and then to throw more and more features to the neural networks. While more features might add more predictive power to the neural network, we argue that redundant features could rather clutter the scenario and more complex neural networks then just could counterbalance the noise. From artificial intelligence and machine learning perspectives, problem representations and solution approaches do mutually interact and thus affect performance. We also argue that comparatively simpler predictors can more easily be reconstructed than the more complex ones. With these arguments in mind, we present a deep learning method named Simpler Angle Predictor (SAP) to train simpler DNN models that enhance protein backbone angle prediction. We then empirically show that SAP can significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are 6–8 in terms of mean absolute error (MAE). The SAP program along with its data is available from the website https://gitlab.com/mahnewton/sap.

[1]  Yuedong Yang,et al.  Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. , 2009, Structure.

[2]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[5]  Bin Xue,et al.  Real‐value prediction of backbone torsion angles , 2008, Proteins.

[6]  Chao Fang,et al.  Applications of deep neural networks to protein structure prediction , 2018 .

[7]  Marco Broccardo,et al.  One neuron is more informative than a deep neural network for aftershock pattern forecasting , 2019, 1904.01983.

[8]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[9]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[10]  Riddhi Thakkar,et al.  Emergence of the East-Central-South-African genotype of Chikungunya virus in Brazil and the city of Rio de Janeiro may have occurred years before surveillance detection , 2018, Scientific Reports.

[11]  Jianpeng Ma,et al.  OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. , 2017, Journal of molecular biology.

[12]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[13]  Kuldip K. Paliwal,et al.  Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks , 2018, Bioinform..

[14]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[15]  Ying Xu,et al.  A historical perspective of template-based protein structure prediction. , 2008, Methods in molecular biology.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  Jianpeng Ma,et al.  OPUS‐CSF: A C‐atom‐based scoring function for ranking protein structural models , 2017, bioRxiv.

[18]  D. Kihara The effect of long‐range interactions on the secondary structure formation of proteins , 2005, Protein science : a publication of the Protein Society.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  H A Scheraga,et al.  Minimization of polypeptide energy. I. Preliminary structures of bovine pancreatic ribonuclease S-peptide. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[21]  B. Lee,et al.  Estimation and use of protein backbone angle probabilities. , 1993, Journal of molecular biology.

[22]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[23]  B Jayaram,et al.  A Stoichiometry Driven Universal Spatial Organization of Backbones of Folded Proteins: Are there Chargaff's Rules for Protein Folding? , 2010, Journal of biomolecular structure & dynamics.

[24]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[25]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[26]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[27]  Arne Elofsson,et al.  A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure , 2019 .

[28]  David Baltimore,et al.  Hierarchical Structure of Proteins , 2000 .

[29]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[30]  Ole Winther,et al.  NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.

[31]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[32]  Jianpeng Ma,et al.  OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks , 2020, Bioinform..

[33]  V. Cutello,et al.  A multi-objective evolutionary approach to the protein structure prediction problem , 2006, Journal of The Royal Society Interface.

[34]  Chao Fang,et al.  Prediction of Protein Backbone Torsion Angles Using Deep Residual Inception Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[36]  Gianluca Pollastri,et al.  Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes , 2018, bioRxiv.

[37]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[38]  Jaswinder Singh,et al.  Single‐sequence‐based prediction of protein secondary structures and solvent accessibility by deep whole‐sequence learning , 2018, J. Comput. Chem..

[39]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[40]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[41]  Yaoqi Zhou,et al.  Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks , 2018, Bioinform..

[42]  Gianluca Pollastri,et al.  Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction , 2019, Scientific Reports.