Profile hidden Markov models

The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.

[1]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[2]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[3]  Donna K. Slonim,et al.  Building Human Genome Maps with Radiation Hybrids , 1997, J. Comput. Biol..

[4]  William Noble Grundy,et al.  ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool , 1996, Comput. Appl. Biosci..

[5]  M S Boguski,et al.  Late-night thoughts on the sequence annotation problem. , 1998, Genome research.

[6]  Sándor Pongor,et al.  The SBASE protein domain library, Release 4.0: a collection of annotated protein sequence segments , 1993, Nucleic Acids Res..

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  A G Murzin,et al.  Distant homology recognition using structural classification of proteins , 1997, Proteins.

[9]  Cathy H. Wu,et al.  A Protein Class Database Organized with ProSite Protein Groups and PIR Superfamilies , 1996, J. Comput. Biol..

[10]  Richard Hughey,et al.  Reduced space hidden Markov model training , 1998, Bioinform..

[11]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[12]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[13]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[14]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[15]  T Yada,et al.  Extraction of hidden Markov model representations of signal patterns in DNA sequences. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[17]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[18]  Kenneth H. Fasman,et al.  Finding Genes in Human DNA with a Hidden Markov Model , 1996, ISMB 1996.

[19]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pierre Baldi,et al.  Hybrid Modeling, HMM/NN Architectures, and Protein Applications , 1996, Neural Computation.

[21]  Richard Hughey,et al.  Weighting hidden Markov models for maximum discrimination , 1998, Bioinform..

[22]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[23]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[24]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[26]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[27]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[28]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[29]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[30]  Jérôme Gouzy,et al.  The ProDom database of protein domain families , 1998, Nucleic Acids Res..

[31]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[32]  Ewan Birney,et al.  Dynamite: A Flexible Code Generating Language for Dynamic Programming Methods Used in Sequence Comparison , 1997, ISMB.

[33]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[34]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[35]  W. Bruno Modeling residue usage in aligned protein sequences via maximum likelihood. , 1996, Molecular biology and evolution.

[36]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[37]  Hiroshi Mamitsuka,et al.  A Learning Method of Hidden Markov Models for Sequence Discrimination , 1996, J. Comput. Biol..

[38]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.

[39]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[40]  L. Hood,et al.  Gene families: the taxonomy of protein paralogs and chimeras. , 1997, Science.

[41]  Sándor Pongor,et al.  The SBASE protein domain library, release 5.0: a collection of annotated protein sequence segments , 1997, Nucleic Acids Res..

[42]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[43]  S. Henikoff,et al.  Scores for sequence searches and alignments. , 1996, Current opinion in structural biology.

[44]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[45]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[46]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[47]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[48]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[49]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[50]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[51]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[52]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[53]  Anders Krogh,et al.  Chapter 4 - An introduction to hidden Markov models for biological sequences , 1998 .

[54]  A. Krogh Two methods for improving performance of an HMM application for gene finding , 1997 .

[55]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[56]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[57]  J. Garnier,et al.  Fold recognition using predicted secondary structure sequences and hidden Markov models of protein folds , 1997, Proteins.

[58]  Collin M. Stultz,et al.  Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. , 1994, Mathematical Biosciences.

[59]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[60]  G. Barton Protein multiple sequence alignment and flexible pattern matching. , 1990, Methods in enzymology.

[61]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[62]  Terri K. Attwood,et al.  The PRINTS protein fingerprint database in its fifth year , 1998, Nucleic Acids Res..

[63]  M Levitt,et al.  Competitive assessment of protein fold recognition and alignment accuracy , 1997, Proteins.

[64]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[65]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[66]  Collin M. Stultz,et al.  Structural analysis based on state‐space modeling , 1993, Protein science : a publication of the Protein Society.

[67]  Richard Hughey,et al.  Parallel hardware for sequence comparison and alignment , 1996, Comput. Appl. Biosci..

[68]  Shmuel Pietrokovski,et al.  Superior performance in protein homology detection with the Blocks Database servers , 1998, Nucleic Acids Res..

[69]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[70]  J. Garnier,et al.  Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. , 1997, Journal of molecular biology.

[71]  Frank Eisenhaber,et al.  Analysis of the position dependent amino acid probabilities and its application to the search for remote homologues , 1998, RECOMB '98.