A Discriminative Framework for Detecting Remote Protein Homologies

A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

[1]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[2]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[6]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[7]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[9]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[10]  Sean R. Eddy,et al.  Multiple Alignment Using Hidden Markov Models , 1995, ISMB.

[11]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Anders Krogh,et al.  SAM: SEQUENCE ALIGNMENT AND MODELING SOFTWARE SYSTEM , 1995 .

[13]  Hiroshi Mamitsuka,et al.  A Learning Method of Hidden Markov Models for Sequence Discrimination , 1996, J. Comput. Biol..

[14]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[15]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  Richard Hughey,et al.  Scoring hidden Markov models , 1997, Comput. Appl. Biosci..

[18]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[19]  Inna Dubchak,et al.  Protein Folding Class Predictor for SCOP: Approach Based on Global Descriptors , 1997, ISMB.

[20]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.

[21]  C Sander,et al.  Predicting protein structure using hidden Markov models , 1997, Proteins.

[22]  C Bystroff,et al.  Blind predictions of local protein structure in CASP2 targets using the I‐sites library , 1997, Proteins.

[23]  N Linial,et al.  Global self-organization of all known protein sequences reveals inherent biological signatures. , 1997, Journal of molecular biology.

[24]  Richard Hughey,et al.  Weighting hidden Markov models for maximum discrimination , 1998, Bioinform..

[25]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[26]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[27]  Michael Gribskov,et al.  Methods and Statistics for Combining Motif Match Scores , 1998, J. Comput. Biol..

[28]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[29]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[30]  William Noble Grundy,et al.  Family-based homology detection via pairwise sequence comparison , 1998, RECOMB '98.

[31]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[32]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[33]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[34]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[35]  A. Califano Advances in sequence analysis. , 2001, Current opinion in structural biology.

[36]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.