PiPred – a deep-learning method for prediction of π-helices in protein sequences

Canonical π-helices are short, relatively unstable secondary structure elements found in proteins. They comprise seven or more residues and are present in 15% of all known protein structures, often in functionally important regions such as ligand- and ion-binding sites. Given their similarity to α-helices, the prediction of π-helices is a challenging task and none of the currently available secondary structure prediction methods tackle it. Here, we present PiPred, a neural network-based tool for predicting π-helices in protein sequences. By performing a rigorous benchmark we show that PiPred can detect π-helices with a per-residue precision of 48% and sensitivity of 46%. Interestingly, some of the α-helices mispredicted by PiPred as π-helices exhibit a geometry characteristic of π-helices. Also, despite being trained only with canonical π-helices, PiPred can identify 6-residue-long α/π-bulges. These observations suggest an even higher effective precision of the method and demonstrate that π-helices, α/π-bulges, and other helical deformations may impose similar constraints on sequences. PiPred is freely accessible at: https://toolkit.tuebingen.mpg.de/#/tools/quick2d. A standalone version is available for download at: https://github.com/labstructbioinf/PiPred, where we also provide the CB6133, CB513, CASP10, and CASP11 datasets, commonly used for training and validation of secondary structure prediction methods, with correctly annotated π-helices.

[1]  H. Luecke,et al.  Structural and functional characterization of pi bulges and other short intrahelical deformations. , 2004, Structure.

[2]  James U Bowie,et al.  Shifting hydrogen bonds may produce flexible transmembrane helices , 2012, Proceedings of the National Academy of Sciences.

[3]  I. Rigoutsos,et al.  Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors. , 2003, Nucleic acids research.

[4]  Michele Vendruscolo,et al.  ARABESQUE: A TOOL FOR PROTEIN STRUCTURAL COMPARISON USING DIFFERENTIAL GEOMETRY AND KNOT THEORY , 2012 .

[5]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[6]  Shaowen Yao,et al.  Protein secondary structure prediction: A survey of the state of the art. , 2017, Journal of molecular graphics & modelling.

[7]  Qin Lu,et al.  CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway , 2018, BMC Bioinformatics.

[8]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Patrick Flick,et al.  GOATOOLS: A Python library for Gene Ontology analyses , 2018, Scientific Reports.

[10]  Antonio Marinho da Silva Neto,et al.  A superposition free method for protein conformational ensemble analyses and local clustering based on a differential geometry representation of backbone , 2019, Proteins.

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[13]  P Andrew Karplus,et al.  Evolutionary origin of a secondary structure: π-helices as cryptic but widespread insertional variations of α-helices that enhance protein functionality. , 2010, Journal of molecular biology.

[14]  Z. Ren,et al.  Transmembrane Helices Tilt, Bend, Slide, Torque, and Unwind between Functional States of Rhodopsin , 2016, Scientific Reports.

[15]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[16]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[17]  Robert M. Graham,et al.  Non-α-helical elements modulate polytopic membrane protein architecture , 2001 .

[18]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[19]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  S. Al-Karadaghi,et al.  Occurrence, conformational features and amino acid propensities for the pi-helix. , 2002, Protein engineering.

[22]  Zhen Li,et al.  Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks , 2016, IJCAI.

[23]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[24]  Yaoqi Zhou,et al.  Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks , 2018, Bioinform..

[25]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[26]  M. Bansal,et al.  Dissecting π‐helices: sequence, structure and function , 2015, The FEBS journal.

[27]  Tom L. Blundell,et al.  CHORAL: a differential geometry approach to the prediction of the cores of protein structures , 2005, Bioinform..

[28]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[29]  Tom L. Blundell,et al.  Polyphony: superposition independent methods for ensemble-based drug discovery , 2014, BMC Bioinformatics.

[30]  Gert Vriend,et al.  Alpha-Bulges in G Protein-Coupled Receptors , 2014, International journal of molecular sciences.

[31]  Dániel Kozma,et al.  PDBTM: Protein Data Bank of transmembrane proteins after 8 years , 2012, Nucleic Acids Res..

[32]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[33]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[34]  R. Riek,et al.  The elusive π-helix. , 2011, Journal of structural biology.

[35]  A. Doig,et al.  Models for the 310‐helix/coil, π‐helix/coil, and α‐helix/310‐helix/coil transitions in isolated peptides , 1996, Protein science : a publication of the Protein Society.

[36]  Michael Schroeder,et al.  PLIP: fully automated protein–ligand interaction profiler , 2015, Nucleic Acids Res..

[37]  Yaohang Li,et al.  Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features , 2014, BMC Bioinformatics.

[38]  David Baker,et al.  High-Resolution Modeling of Transmembrane Helical Protein Structures from Distant Homologues , 2014, PLoS Comput. Biol..

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.