Rapid Methods for Comparing Protein Structures and Scanning Structure Databases

Databases of three-dimensional macromolecular structures became so large that fast search tools and comparison methods were needed and were actually designed. All of them employ simplified representations of the threedimensional structure: strings of characters of variable length, which can be handled with procedures that were designed for sequence analysis; fixed dimension arrays that can be processed with standard statistical methods; ensembles of secondary structural elements, which are much less numerous than the atoms/residues of the protein; and continuous representations of the backbone, through stereochemical figures. Some of these computational procedures were developed long ago, when computers were too slow, and others have been designed recently, with the specific aim of handling large amount of information. The present article is focused on the algorithms that allow fast structure comparison, particularly suitable to handle large databases, and should provide a comprehensive picture, useful for the development and the assessment of novel tools.

[1]  S. Pongor,et al.  Protein fold similarity estimated by a probabilistic approach based on Cα-Cα distance comparison , 2002 .

[2]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[5]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  structural similarity of DNA-binding domains of , .

[7]  Yu-Dong Cai,et al.  Prediction of protein function in the absence of significant sequence similarity. , 2004, Current medicinal chemistry.

[8]  S. Rackovsky,et al.  Influence of ordered backbone structure on protein folding. A study of some simple models. , 1978, Macromolecules.

[9]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[10]  S Henikoff,et al.  Performance evaluation of amino acid substitution matrices , 1993, Proteins.

[11]  G. Kleywegt,et al.  Detecting folding motifs and similarities in protein structures. , 1997, Methods in enzymology.

[12]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[13]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[14]  M Karplus,et al.  Anatomy of a conformational change: hinged "lid" motion of the triosephosphate isomerase loop. , 1990, Science.

[15]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[16]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[17]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[18]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[19]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[20]  A unique or essentially unique single parametric characterisation of biopolymeric structures. , 1993, Journal of biomolecular structure & dynamics.

[21]  R. Lavery,et al.  Describing protein structure: A general algorithm yielding complete helicoidal parameters and a unique overall axis , 1989, Proteins.

[22]  D. Brutlag,et al.  FoldMiner: Structural motif discovery using an improved superposition algorithm , 2004, Protein science : a publication of the Protein Society.

[23]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[24]  Andrew J. Martin,et al.  The ups and downs of protein topology; rapid comparison of protein structure. , 2000, Protein engineering.

[25]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[26]  David R. Gilbert,et al.  Motif-based searching in TOPS protein topology databases , 1999, Bioinform..

[27]  Ambuj K. Singh,et al.  Index-based Similarity Search for Protein Structure Databases , 2004, J. Bioinform. Comput. Biol..

[28]  W. Pearson,et al.  Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[29]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[30]  D. Stuart,et al.  A method for the systematic comparison of the three‐dimensional structures of proteins and some results , 1984 .

[31]  David R. Gilbert,et al.  A Computer System to Perform Structure Comparison using Representations of Protein Structure , 2002, Comput. Chem..

[32]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[33]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[34]  K. Nishikawa,et al.  Protein structure comparison using the Markov transition model of evolution , 2000, Proteins.

[35]  Janusz M. Bujnicki,et al.  STRUCLA: a WWW meta-server for protein structure comparison and evolutionary classification , 2003, ISMB.

[36]  Adam Godzik,et al.  FATCAT: a web server for flexible structure comparison and structure similarity searching , 2004, Nucleic Acids Res..

[37]  S. Dowdy,et al.  Statistics for Research , 1983 .

[38]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[39]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[40]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[41]  Yen-Jen Oyang,et al.  ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures , 2004, Nucleic Acids Res..

[42]  S. Pongor,et al.  Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison. , 2002, Journal of molecular biology.

[43]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[44]  H A Scheraga,et al.  Influence of water on protein structure. An analysis of the preferences of amino acid residues for the inside or outside and for specific conformations in a protein molecule. , 1978, Macromolecules.

[45]  L. Kuhn,et al.  The role of structure in antibody cross-reactivity between peptides and folded proteins. , 1998, Journal of molecular biology.

[46]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Guoguang Lu,et al.  TOP: a new method for protein structure comparisons and similarity searches , 2000 .

[48]  Ruth Nussinov,et al.  MultiProt - A Multiple Protein Structural Alignment Algorithm , 2002, WABI.

[49]  A. Korn,et al.  Torsion angle differences as a means of pinpointing local polypeptide chain trajectory changes for identical proteins in different conformational states. , 1994, Protein engineering.

[50]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[51]  S L Mowbray,et al.  Cα‐based torsion angles: A simple tool to analyze protein conformational changes , 1995, Protein science : a publication of the Protein Society.

[52]  C. A. Andersen,et al.  Continuum secondary structure captures protein flexibility. , 2002, Structure.

[53]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[54]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[55]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[56]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[57]  Alexej Abyzov,et al.  Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point , 2004, Protein science : a publication of the Protein Society.

[58]  H. Wolfson,et al.  Flexible protein alignment and hinge detection , 2002, Proteins.

[59]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[61]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[62]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[63]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[64]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[65]  G. Kleywegt,et al.  Interactive motif and fold recognition in protein structures , 2002 .

[66]  S. Rackovsky,et al.  Differential geometry and protein folding , 1984 .

[67]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[68]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.

[69]  Burkhard Rost,et al.  Secondary structure assignment. , 2003, Methods of biochemical analysis.

[70]  Jean-François Sadoc,et al.  Protein secondary structure assignment through Voronoï tessellation , 2004, Proteins.

[71]  Yuan-Fang Wang,et al.  Protein Structure Alignment and Fast Similarity Search Using Local Shape Signatures , 2004, J. Bioinform. Comput. Biol..

[72]  M. Gerstein,et al.  Structural Genomics: Current Progress , 2003, Science.

[73]  M. E. Karpen,et al.  Comparing short protein substructures by a method based on backbone torsion angles , 1989, Proteins.

[74]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[75]  Dmitrij Frishman,et al.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins , 2004, Nucleic Acids Res..

[76]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[77]  Dariusz Plewczynski,et al.  Comparison of proteins based on segments structural similarity. , 2004, Acta biochimica Polonica.

[78]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[79]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[80]  David S. Wishart,et al.  VADAR: a web server for quantitative evaluation of protein structure quality , 2003, Nucleic Acids Res..

[81]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[82]  R. Service Tapping DNA for Structures Produces a Trickle , 2002, Science.

[83]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.