Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes

Motivation Although secondary structure predictors have been developed for decades, current ab initio methods have still some way to go to reach their theoretical limits. Moreover, the continuous effort towards harnessing ever expanding data sets and more sophisticated, deeper Machine Learning techniques, has not come to an end. Results Here we present Porter 5, the latest release of one of the best performing ab initio secondary structure predictors. Version 5 achieves 84% accuracy (84% SOV) when tested on 3 classes, and 73% accuracy (77% SOV) on 8 classes, on a large independent set, significantly outperforming all the most recent ab initio predictors we have tested. Availability The web and standalone versions of Porter5 are available at http://distilldeep.ucd.ie/porter/. Contact gianluca.pollastri@ucd.ie

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[3]  Kuldip K. Paliwal,et al.  Sixty-five years of the long march in protein secondary structure prediction: the final stretch? , 2016, Briefings Bioinform..

[4]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[5]  Silvio C. E. Tosatto,et al.  Correct machine learning on protein sequences: a peer-reviewing perspective , 2016, Briefings Bioinform..

[6]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[7]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[8]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[9]  Tong Liu,et al.  SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity , 2018, Source Code for Biology and Medicine.

[10]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[11]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[12]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[13]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[14]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[15]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[16]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[17]  Gianluca Pollastri,et al.  Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility , 2013, Bioinform..

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.