论文信息 - Bayesian Protein Secondary Structure Prediction With Near-Optimal Segmentations

Bayesian Protein Secondary Structure Prediction With Near-Optimal Segmentations

Secondary structure prediction is an invaluable tool in determining the 3-D structure and function of proteins. Typically, protein secondary structure prediction methods suffer from low accuracy in beta-strand predictions, where nonlocal interactions play a significant role. There is a considerable need to model such long- range interactions that contribute to the stabilization of a protein molecule. In this paper, we introduce an alternative decoding technique for the hidden semi-Markov model (HSMM) originally employed in the BSPSS algorithm, and further developed in the IPSSP algorithm. The proposed method is based on the N-best paradigm where a set of most likely segmentations is computed. To generate suboptimal segmentations (i.e., alternative prediction sequences), we developed two N-best search algorithms. The first one is an A* stack decoder algorithm that extends paths (or hypotheses) by one symbol at each iteration. The second algorithm locally keeps the end positions of the highest scoring K previous segments and performs backtracking. Both algorithms employ the hidden semi- Markov model described in Aydin etal. [5], and use Viterbi scoring to compute the N-best list. The availability of near-optimal segmentations and the utilization of the Viterbi scoring enable the sequences to be rescored using more complex dependency models that characterize nonlocal interactions in beta-sheets. After the score update, one can either keep the segmentations to be employed in 3-D structure prediction or predict the secondary structure by applying a weighted voting procedure to a set of top scoring M ges 1 segmentations. The accuracy measures of the N-best method when used to predict the secondary structure are shown to be comparable or better than the classical Viterbi decoder (MAP estimator), tested under the single-sequence condition. When no rescoring is applied, the stack decoder algorithm with sufficiently large M improves the overall sensitivity measure (Q3) of the Viterbi algorithm by 1.1%. At the same M value, the N-best Viterbi algorithm improves the Q3 measure by 0.25% as well as the sensitivity measures specific for each secondary structure type (Qobs alpha, Qobs beta, Qobs L). When the sequences are rescored using the posterior probability distribution computed by the posterior decoding algorithm (MPM estimator), N-best Viterbi improves the Q3 measure of the Viterbi algorithm by 2.6%. The rescored N-best list approach also enables us to generate suboptimal segmentations that are valid sequences (i.e., realizable from the hidden semi-Markov model). Although the N-best algorithms and the score update procedure brought significant improvements over the Viterbi algorithm, they were not able to outperform the posterior decoding algorithm in the single-sequence condition. Further improvements in the prediction accuracy should be possible with the incorporation of sophisticated models for nonlocal interactions and other physical constraints that stabilize the overall structure of a protein.

Hakan Erdogan | Yücel Altunbasak | Zafer Aydin

[1] F. Jelinek. Fast sequential decoding algorithm using a stack , 1969 .

[2] Simon Cawley,et al. HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[3] M J Sternberg,et al. A simple method to generate non-trivial alternate alignments of protein sequences. , 1991, Journal of molecular biology.

[4] Silvio C. E. Tosatto,et al. MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification. , 2003, Protein engineering.

[5] B. Rost. Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[6] M. Waterman,et al. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[7] Yücel Altunbasak,et al. Protein secondary structure prediction with semi Markov HMMs , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8] V. Thorsson,et al. HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[9] Pierre Baldi,et al. Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[10] B. Rost,et al. Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[11] Gianluca Pollastri,et al. Combining protein secondary structure prediction models with ensemble methods of optimal complexity , 2004, Neurocomputing.

[12] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13] BaldiPierre,et al. Three-stage prediction of protein β-sheets by neural networks, alignments and graph algorithms , 2005 .

[14] Ronald M. Levy,et al. Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases , 2000, Bioinform..

[15] F. Young. Biochemistry , 1955, The Indian Medical Gazette.

[16] Giovanni Soda,et al. Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[17] Silvio C. E. Tosatto,et al. The SSEA server for protein secondary structure alignment , 2005, Bioinform..

[18] María S. Pérez-Hernández,et al. Bayesian network multi-classifiers for protein secondary structure prediction , 2004, Artif. Intell. Medicine.

[19] Guy M. McKhann,et al. Biochemistry. 3rd edition , 1988, The Yale Journal of Biology and Medicine.