A 9-state hidden Markov model using protein secondary structure information for protein fold recognition

In protein fold recognition, the main disadvantage of hidden Markov models (HMMs) is the employment of large-scale model architectures which require large data sets and high computational resources for training. Also, HMMs must consider sequential information about secondary structures of proteins, to improve prediction performance and reduce model parameters. Therefore, we propose a novel method for protein fold recognition based on a hidden Markov model, called a 9-state HMM. The method can (i) reduce the number of states using secondary structure information about proteins for each fold and (ii) recognize protein folds more accurately than other HMMs.

[1]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[2]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[9]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[10]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[11]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[12]  Per Jambeck,et al.  Developing Bioinformatics Computer Skills , 2001 .

[13]  David T. Jones,et al.  Bioinformatics: Genes, Proteins and Computers , 2007 .

[14]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[16]  Djamel Bouchaffra,et al.  Protein Fold Recognition using a Structural Hidden Markov Model , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Jinbo Xu Fold recognition by predicted alignment accuracy , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[19]  Yorgos Goletsis,et al.  Sequence-based protein structure prediction using a reduced state-space hidden Markov model , 2007, Comput. Biol. Medicine.

[20]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..