DeepPSC (protein structure camera): computer vision-based protein backbone structure reconstruction from alpha carbon trace as a case study

Deep learning has been increasingly used in protein tertiary structure prediction, a major goal in life science. However, all the algorithms developed so far mostly use protein sequences as input, whereas the vast amount of protein tertiary structure information available in the Protein Data Bank (PDB) database remains largely unused, because of the inherent complexity of 3D data computation. In this study, we propose Protein Structure Camera (PSC) as an approach to convert protein structures into images. As a case study, we developed a deep learning method incorporating PSC (DeepPSC) to reconstruct protein backbone structures from alpha carbon traces. DeepPSC outperformed all the methods currently available for this task. This PSC approach provides a useful tool for protein structure representation, and for the application of deep learning in protein structure prediction and protein engineering.

[1]  Xiaozhao Fang,et al.  Protein fold recognition based on multi-view modeling , 2019, Bioinform..

[2]  Carmay Lim,et al.  How Molecular Size Impacts RMSD Applications in Molecular Dynamics Simulations. , 2017, Journal of chemical theory and computation.

[3]  Richard Bonneau,et al.  deepNF: deep network fusion for protein function prediction , 2017, bioRxiv.

[4]  P. Payne,et al.  Reconstruction of protein conformations from estimated positions of the Cα coordinates , 1993, Protein science : a publication of the Protein Society.

[5]  Yifan Cheng,et al.  Single-particle cryo-EM—How did it get here and where will it go , 2018, Science.

[6]  John Z. H. Zhang,et al.  Computational Protein Design with Deep Learning Neural Networks , 2018, Scientific Reports.

[7]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[8]  ROBERT M. ESNOUF't Polyalanine Reconstruction from Ca Positions Using the Program CALPHA Can Aid Initial Phasing of Data by Molecular Replacement Procedures , 1997 .

[9]  James W. Murray,et al.  High–quality protein backbone reconstruction from alpha carbons using Gaussian mixture models , 2013, J. Comput. Chem..

[10]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[11]  W. Nau,et al.  A conformational flexibility scale for amino acids in peptides. , 2003, Angewandte Chemie.

[12]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[13]  B. Carragher,et al.  Cryo-EM for Small Molecules Discovery, Design, Understanding, and Application. , 2018, Cell chemical biology.

[14]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[15]  Yongjian Li,et al.  Predicting drug–protein interaction using quasi-visual question answering system , 2019, Nature Machine Intelligence.

[16]  Jian Peng,et al.  A Network Integration Approach for Drug-Target Interaction Prediction and Computational Drug Repositioning from Heterogeneous Information , 2017, RECOMB 2017.

[17]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  SchmidhuberJürgen,et al.  2005 Special Issue , 2005 .

[19]  Ian W. Davis,et al.  Structure validation by Cα geometry: ϕ,ψ and Cβ deviation , 2003, Proteins.

[20]  Kevin Cowtan,et al.  research papers Acta Crystallographica Section D Biological , 2005 .

[21]  Randy J. Read,et al.  Acta Crystallographica Section D Biological , 2003 .

[22]  Jie Hou,et al.  DeepSF: deep convolutional neural network for mapping protein sequences to folds , 2017, Bioinform..

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[26]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Taeho Jo,et al.  Improving Protein Fold Recognition by Deep Learning Networks , 2015, Scientific Reports.

[29]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Maxat Kulmanov,et al.  DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier , 2017, Bioinform..

[32]  S. Pongor,et al.  A normalized root‐mean‐spuare distance for comparing protein three‐dimensional structures , 2001, Protein science : a publication of the Protein Society.

[33]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[34]  S. Barnett,et al.  Philosophical Transactions of the Royal Society A : Mathematical , 2017 .

[35]  A. M. B. DOUGLAS,et al.  X-Ray Crystallography , 1947, Nature.

[36]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[37]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[38]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[39]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[40]  Ruben Abagyan,et al.  Methods of protein structure comparison. , 2012, Methods in molecular biology.

[41]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[42]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[43]  J. Maurice Rojas,et al.  Practical conversion from torsion space to Cartesian space for in silico protein synthesis , 2005, J. Comput. Chem..

[44]  Pierre Tufféry,et al.  SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace , 2006, Nucleic Acids Res..

[45]  L. Kay,et al.  Multidimensional NMR Methods for Protein Structure Determination , 2001, IUBMB life.

[46]  Dominik Gront,et al.  Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates , 2007, J. Comput. Chem..

[47]  Dapeng Xiong,et al.  A deep learning framework for improving long‐range residue‐residue contact prediction using a hierarchical strategy , 2017, Bioinform..

[48]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[49]  M. Baker,et al.  4.4 Å cryo-EM structure of an enveloped alphavirus Venezuelan equine encephalitis virus , 2011, The EMBO journal.

[50]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[51]  R M Esnouf,et al.  Polyalanine reconstruction from Calpha positions using the program CALPHA can aid initial phasing of data by molecular replacement procedures. , 1997, Acta crystallographica. Section D, Biological crystallography.

[52]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[53]  Jeffrey Skolnick,et al.  Fast procedure for reconstruction of full‐atom protein models from reduced representations , 2008, J. Comput. Chem..

[54]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[55]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[56]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[57]  Yang Zhang,et al.  REMO: A new protocol to refine full atomic protein models from C‐alpha traces by optimizing hydrogen‐bonding networks , 2009, Proteins.