Improving the Sensitivity and Specificity of Protein Homology Search by Incorporating Predicted Secondary Structures

In this paper, we improve the homology search performance by the combination of the predicted protein secondary structures and protein sequences. Previous research suggested that the straightforward combination of predicted secondary structures did not improve the homology search performance, mostly because of the errors in the structure prediction. We solved this problem by taking into account the confidence scores output by the prediction programs.

[1]  Vladimir Pestov,et al.  Indexing schemes for similarity search in datasets of short protein fragments , 2007, Inf. Syst..

[2]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[3]  Ronald M. Levy,et al.  Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases , 2000, Bioinform..

[4]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[5]  Peter A. Spiro,et al.  A Local Alignment Metric for Accelerating Biosequence Database Search , 2004, J. Comput. Biol..

[6]  Bin Ma,et al.  PatternHunter II: highly sensitive and fast homology search. , 2003, Genome informatics. International Conference on Genome Informatics.

[7]  Y Xu,et al.  Protein threading using PROSPECT: Design and evaluation , 2000, Proteins.

[8]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[11]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[12]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[13]  Y Shan,et al.  Fold recognition and accurate query‐template alignment by a combination of PSI‐BLAST and threading , 2001, Proteins.

[14]  Marcin von Grotthuss,et al.  ORFeus: detection of distant homology using sequence profiles and predicted secondary structure , 2003, Nucleic Acids Res..

[15]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[16]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  Silvio C. E. Tosatto,et al.  MANIFOLD: protein fold recognition based on secondary structure, sequence similarity and enzyme classification. , 2003, Protein engineering.

[19]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Bin Ma,et al.  HOMOLOGY SEARCH METHODS , 2004 .

[21]  A Tsugita,et al.  The PIR-International Protein Sequence Database. , 1996, Nucleic acids research.

[22]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[23]  W. Pearson Empirical statistical estimates for sequence similarity searches. , 1998, Journal of molecular biology.

[24]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[25]  G. Rose,et al.  Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons. , 1998, Proceedings of the National Academy of Sciences of the United States of America.