HMMs in Protein Fold Classification.

The limitation of most HMMs is their inherent high dimensionality. Therefore we developed several variations of low complexity models that can be applied even to protein families with a few members. In this chapter we present these variations. All of them include the use of a hidden Markov model (HMM), with a small number of states (called reduced state-space HMM), which is trained with both amino acid sequence and secondary structure of proteins whose 3D structure is known and it is used for protein fold classification. We used data from Protein Data Bank and annotation from SCOP database for training and evaluation of the proposed HMM variations for a number of protein folds that belong to major structural classes. Results indicate that the variations have similar performance, or even better in some cases, on classifying proteins than SAM, which is a widely used HMM-based method for protein classification. The major advantage of the proposed variations is that we employed a small number of states and the algorithms used for training and scoring are of low complexity and thus relatively fast. The main variations examined include a version of the reduced state-space HMM with seven states (7-HMM), a version of the reduced state-space HMM with three states (3-HMM) and an optimized version of the reduced state-space HMM with three states, where an optimization process is applied to its scores (optimized 3-HMM).

[1]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  A. Elofsson,et al.  Hidden Markov models that use predicted secondary structures for fold recognition , 1999, Proteins.

[4]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[5]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[6]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[9]  Jeffrey C. Lagarias,et al.  Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions , 1998, SIAM J. Optim..

[10]  Jinbo Xu Fold recognition by predicted alignment accuracy , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Saeed Jalili,et al.  Protein fold recognition with a two-layer method based on SVM-SA, WP-NN and C4.5 (TLM-SNC) , 2013, Int. J. Data Min. Bioinform..

[12]  A. Murzin Structure classification‐based assessment of CASP3 predictions for the fold recognition targets , 1999, Proteins.

[13]  Dimitrios I. Fotiadis,et al.  Mining sequential patterns for protein fold recognition , 2008, J. Biomed. Informatics.

[14]  Keun Ho Ryu,et al.  A 9-state hidden Markov model using protein secondary structure information for protein fold recognition , 2009, Comput. Biol. Medicine.

[15]  Kevin J. Maurice,et al.  SSThread: Template‐free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs , 2014, J. Comput. Chem..

[16]  M J Sippl,et al.  Protein folds from pair interactions: A blind test in fold recognition , 1997, Proteins.

[17]  A. G. Brevern,et al.  Use of a structural alphabet to find compatible folds for amino acid sequences , 2015, Protein science : a publication of the Protein Society.

[18]  P. Deschavanne,et al.  Enhanced protein fold recognition using a structural alphabet , 2009, Proteins.

[19]  David T. Jones,et al.  Bioinformatics: Genes, Proteins and Computers , 2007 .

[20]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[21]  Chih-Jung Chen,et al.  A PSO-AB classifier for solving sequence classification problems , 2015, Appl. Soft Comput..

[22]  Thomas Lengauer,et al.  BMC Bioinformatics Methodology article Local protein structure prediction using discriminative models , 2006 .

[23]  Katarzyna Stapor,et al.  A hybrid discriminative/generative approach to protein fold recognition , 2012, Neurocomputing.

[24]  Mehdi Ghatee,et al.  FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds , 2013, Comput. Biol. Medicine.

[25]  Dimitrios I. Fotiadis,et al.  Assessment of optimized Markov models in protein fold classification , 2014, J. Bioinform. Comput. Biol..

[26]  D. Whitford,et al.  Proteins: Structure and Function , 2005, Annals of Biomedical Engineering.

[27]  Richard Hughey,et al.  Calibrating E-values for hidden Markov models using reverse-sequence null models , 2005, Bioinform..

[28]  Yorgos Goletsis,et al.  Sequence-based protein structure prediction using a reduced state-space hidden Markov model , 2007, Comput. Biol. Medicine.

[29]  P Argos,et al.  Identifying the tertiary fold of small proteins with different topologies from sequence and secondary structure using the genetic algorithm and extended criteria specific for strand regions. , 1996, Journal of molecular biology.

[30]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[31]  Aiping Wu,et al.  Incorporation of Local Structural Preference Potential Improves Fold Recognition , 2011, PloS one.

[32]  Dimitrios I. Fotiadis,et al.  Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model , 2009, Comput. Biol. Medicine.

[33]  Y. Duan,et al.  Trends in template/fragment-free protein structure prediction , 2010, Theoretical chemistry accounts.

[34]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[35]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[36]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[37]  Konstantina S. Nikita,et al.  A similarity network approach for the analysis and comparison of protein sequence/structure sets , 2010, J. Biomed. Informatics.

[38]  André Yoshiaki Kashiwabara,et al.  Decreasing the number of false positives in sequence classification , 2010, BMC Genomics.