Using an alignment of fragment strings for comparing protein structures

MOTIVATION Most methods that are used to compare protein structures use three-dimensional (3D) structural information. At the same time, it has been shown that a 1D string representation of local protein structure retains a degree of structural information. This type of representation can be a powerful tool for protein structure comparison and classification, given the arsenal of sequence comparison tools developed by computational biology. However, in order to do so, there is a need to first understand how much information is contained in various possible 1D representations of protein structure. RESULTS Here we describe the use of a particular structure fragment library, denoted here as KL-strings, for the 1D representation of protein structure. Using KL-strings, we develop an infrastructure for comparing protein structures with a 1D representation. This study focuses on the added value gained from such a description. We show the new local structure language adds resolution to the traditional three-state (helix, strand and coil) secondary structure description, and provides a high degree of accuracy in recognizing structural similarities when used with a pairwise alignment benchmark. The results of this study have immediate applications towards fast structure recognition, and for fold prediction and classification.

[1]  Igor F. Tsigelny Protein Structure Prediction: Bioinformatic Approach , 2002 .

[2]  Adam Godzik,et al.  A segment alignment approach to protein comparison , 2003, Bioinform..

[3]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[4]  W. Eisner The Building , 1987 .

[5]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[6]  Ruth Nussinov,et al.  fragment folding and assembly Reducing the computational complexity of protein folding via , 2002 .

[7]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[10]  A R Panchenko,et al.  Foldons, protein structural modules, and exons. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[11]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[12]  J. Parker Amino Acid Substitution , 2001 .

[13]  Akira R. Kinjo,et al.  Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins , 2004, Bioinform..

[14]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[15]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[16]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[17]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[18]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[19]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[20]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[21]  A R Panchenko,et al.  The foldon universe: a survey of structural similarity and self-recognition of independently folding units. , 1997, Journal of molecular biology.

[22]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[23]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[24]  Piotr Berman,et al.  Fold classification based on secondary structure – how much is gained by including loop topology? , 2005, BMC Structural Biology.

[25]  Adam Godzik,et al.  Database searching by flexible protein structure alignment , 2004, Protein science : a publication of the Protein Society.

[26]  Adam Godzik,et al.  Connecting the protein structure universe by using sparse recurring fragments. , 2005, Structure.

[27]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[28]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[29]  S. Henikoff,et al.  Automated construction and graphical presentation of protein blocks from unaligned sequences. , 1995, Gene.

[30]  A. Godzik,et al.  The interplay of fold recognition and experimental structure determination in structural genomics. , 2004, Current opinion in structural biology.

[31]  M. Levitt,et al.  Protein decoy assembly using short fragments under geometric constraints , 2003, Biopolymers.

[32]  Eigenvalue Analysis , .

[33]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.

[34]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.