Protein secondary structure prediction with semi Markov HMMs

Secondary structure prediction has been an essential task in determining the structure and function of the proteins. Prediction accuracy is improving every year towards the 88% estimated theoretical limit. There are two approaches for the secondary structure prediction. The first one, ab initio (single sequence) prediction does not use any homology information. The evolutionary information, if available, is used by the second approach to improve the prediction accuracy by a few percentages. In this paper, we address the problem of single sequence prediction by developing a semi Markov HMM, similar to the one proposed by Schmidler et al.. We introduce a better dependency model by considering the statistically significant amino acid correlation patterns at segment borders. Also, we propose an internal dependency model considering right to left dependencies without modifying the left to right HMM topology. In addition, we propose an iterative training method to better estimate the HMM parameters. Putting all these together, we obtained 1.5% improvement in three-state-per-residue accuracy.