A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction

Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.

[1]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[2]  Jianlin Cheng,et al.  DNdisorder: predicting protein disorder using boosting and deep networks , 2013, BMC Bioinformatics.

[3]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP) — round x , 2014, Proteins.

[4]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[5]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[6]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[7]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[8]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[10]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[11]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[12]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[13]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[14]  Torsten Schwede,et al.  Automated protein structure homology modeling: a progress report. , 2004, Pharmacogenomics.

[15]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[16]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[17]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[18]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[19]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[20]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[21]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[25]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[26]  Yanjun Qi,et al.  A Unified Multitask Architecture for Predicting Local Protein Properties , 2012, PloS one.

[27]  Silvio C. E. Tosatto,et al.  REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform , 2009, Bioinform..

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[30]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[31]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[32]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[33]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[34]  Xin-Qiu Yao,et al.  A dynamic Bayesian network approach to protein secondary structure prediction , 2008, BMC Bioinformatics.

[35]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[36]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[38]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[39]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[40]  Volodymyr Mnih,et al.  CUDAMat: a CUDA-based matrix class for Python , 2009 .

[41]  A. Lesk,et al.  Assessment of novel fold targets in CASP4: Predictions of three‐dimensional structures, secondary structures, and interresidue contacts , 2001, Proteins.

[42]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[43]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[44]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[45]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[46]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[47]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[48]  Burkhard Rost,et al.  Improving fold recognition without folds. , 2004, Journal of molecular biology.

[49]  Christian Cole,et al.  The Jpred 3 secondary structure prediction server , 2008, Nucleic Acids Res..