Protein structural similarity search by Ramachandran codes

BackgroundProtein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases.ResultsWe propose a new linear encoding method, SARST (S tructural similarity search A ided by R amachandran S equential T ransformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms.ConclusionAs a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

[1]  Serge A. Hazout,et al.  Local backbone structure prediction of proteins , 2004, Silico Biol..

[2]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[3]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[4]  Kian-Lee Tan,et al.  Rapid 3D protein structure database searching using information retrieval techniques , 2004, Bioinform..

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  N. Srinivasan,et al.  A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications , 2006, Proteins.

[7]  Eytan Domany,et al.  Automated assignment of SCOP and CATH protein structure classifications from FSSP scores , 2002, Proteins.

[8]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[9]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[10]  A C Camproux,et al.  A hidden markov model derived structural alphabet for proteins. , 2004, Journal of molecular biology.

[11]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[12]  Patrice Koehl,et al.  ASTRAL compendium enhancements , 2002, Nucleic Acids Res..

[13]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[14]  D. Stuart,et al.  A method for the systematic comparison of the three‐dimensional structures of proteins and some results , 1984 .

[15]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[16]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[17]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[18]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[19]  J. Richardson,et al.  Principles and Patterns of Protein Conformation , 1989 .

[20]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[21]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[22]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[25]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[26]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[27]  S. Wodak,et al.  The design of idealized α/β‐barrels: Analysis of β‐sheet closure requirements , 1990 .

[28]  K C Chou,et al.  Energetic approach to the folding of alpha/beta barrels. , 1991, Proteins.

[29]  Arthur M. Lesk Application of Sequence Alignment Methods to Multiple Structural Alignment and Superposition , 1998, Stringology.

[30]  Pierre Tufféry,et al.  SA-Search: a web tool for protein structure mining based on a Structural Alphabet , 2004, Nucleic Acids Res..

[31]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[32]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[33]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[34]  A. Lesk,et al.  Principles determining the structure of beta-sheet barrels in proteins. I. A theoretical analysis. , 1994, Journal of molecular biology.

[35]  Wu-chun Feng Green Destiny + mpiBLAST = Bioinfomagic , 2003, PARCO.

[36]  I D Kuntz,et al.  Amino acid composition and hydrophobicity patterns of protein domains correlate with their structures , 1985, Biopolymers.

[37]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[38]  A. Efimov Standard structures in proteins. , 1993, Progress in biophysics and molecular biology.

[39]  Nagiza F. Samatova,et al.  Efficient data access for parallel BLAST , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[40]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[41]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[42]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[43]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[44]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[45]  S J Wodak,et al.  Structural principles of parallel beta-barrels in proteins. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[46]  A. G. Brevern,et al.  “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence , 2007, Journal of Biosciences.

[47]  Kuo-Chen Chou,et al.  Energetic approach to the folding of α/β barrels , 1991 .

[48]  Joël Pothier,et al.  YAKUSA: A fast structural database scanning method , 2005, Proteins.

[49]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[50]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[51]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[52]  S J Wodak,et al.  The design of idealized alpha/beta-barrels: analysis of beta-sheet closure requirements. , 1990, Proteins.

[53]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.