Combining segmental semi-Markov models with neural networks for protein secondary structure prediction

Motivation: Predicting the secondary structure of proteins from a primary sequence alone has been variously approached from either a classification or a generative model perspective. The most prominent classification methods have used neural networks, which involves mappings from a local window of residues in the sequence to the structural state of the central residue in the window, thus capturing the local interactions effectively. However, they fail to capture distant interactions among residues. The generative models based on Bayesian segmentation capture sequence structure relationships using generalized hidden Markov models with explicit state duration. They capture non-local interactions through a joint sequence-structure probability distribution based on structural segments. In this paper, we investigate a combined architecture of Bayesian segmentation at the first stage and neural network at the second stage which captures both local and non-local correlation, to increase the single sequence prediction accuracy. The combined architecture is further enhanced by using neural network optimization and ensemble techniques. Results: The proposed architecture has been built and tested on two widely studied databases comprising 480 and 608 protein sequences, respectively. It achieved accuracies of above 71%, which is comparable to the highest accuracies reported so far for single sequence methods, without using the evolutionary information provided by multiple sequence alignments. The required data sets and program codes are available at http://www.gippsland.monash.edu.au/research/publish/neurocomputing.zip.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  Mari Ostendorf,et al.  From HMMS to Segment Models: Stochastic Modeling for CSR , 1996 .

[3]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[4]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[5]  F E Cohen,et al.  Evaluation of current techniques for Ab initio protein structure prediction , 1995, Proteins.

[6]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[7]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[8]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[9]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[10]  Juan Liu,et al.  Predicting protein secondary structure by a support vector machine based on a new coding scheme. , 2004, Genome informatics. International Conference on Genome Informatics.

[11]  G. Crooks,et al.  Protein secondary structure: entropy, correlations and prediction. , 2003, Bioinformatics.

[12]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[13]  Madhu Chetty,et al.  An Incremental Constructive Layer Algorithm for Controller Design , 2003, HIS.

[14]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[15]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[16]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[17]  Mark C. K. Yang,et al.  An improved procedure for gene selection from microarray experiments using false discovery rate criterion , 2006, BMC Bioinformatics.

[18]  Douglas L. Brutlag,et al.  Bayesian Protein Structure Prediction , 2002 .

[19]  S A Benner,et al.  Predicting the conformation of proteins from sequences. Progress and future progress. , 1994, Advances in enzyme regulation.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[22]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[23]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[24]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[25]  Jean-François Gibrat,et al.  Choosing the optimal hidden Markov model for secondary-structure prediction , 2005, IEEE Intelligent Systems.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Burkhard Rost,et al.  Prediction in 1D: secondary structure, membrane helices, and accessibility. , 2003, Methods of biochemical analysis.

[28]  Wei Chu,et al.  Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[30]  M. Kamel,et al.  Sharing training patterns in neural network ensembles , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  A A Salamov,et al.  Protein secondary structure prediction using local alignments. , 1997, Journal of molecular biology.

[32]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[33]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[34]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[35]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[36]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[37]  Wei Chu,et al.  A graphical model for protein secondary structure prediction , 2004, ICML.

[38]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[39]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[40]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[41]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[42]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.