Structural alphabets derived from attractors in conformational space

BackgroundThe hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis.ResultsA Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness.ConclusionsThe density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.

[1]  Andrew E. Torda,et al.  Protein sequence and structure alignments within one framework , 2008, Algorithms for Molecular Biology.

[2]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[3]  Jinn-Moon Yang,et al.  fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies , 2007, Nucleic Acids Res..

[4]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[5]  M Tyagi,et al.  Protein structure mining using a structural alphabet , 2008, Proteins.

[6]  P. Deschavanne,et al.  Enhanced protein fold recognition using a structural alphabet , 2009, Proteins.

[7]  Bert L. de Groot,et al.  tCONCOORD‐GUI: Visually supported conformational sampling of bioactive molecules , 2009, J. Comput. Chem..

[8]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  Adam Godzik,et al.  Using an alignment of fragment strings for comparing protein structures , 2007, Bioinform..

[11]  Volkhard Helms,et al.  What induces pocket openings on protein surface patches involved in protein–protein interactions? , 2009, J. Comput. Aided Mol. Des..

[12]  Pierre Tufféry,et al.  SA-Search: a web tool for protein structure mining based on a Structural Alphabet , 2004, Nucleic Acids Res..

[13]  Kevin D. Reilly,et al.  SEQOPTICS: a protein sequence clustering system , 2006, BMC Bioinformatics.

[14]  A. Atilgan,et al.  Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. , 1997, Folding & design.

[15]  Desire L. Massart,et al.  Looking for Natural Patterns in Analytical Data, 2. Tracing Local Density with OPTICS , 2002, J. Chem. Inf. Comput. Sci..

[16]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[17]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[18]  L. Pauling,et al.  Fundamental dimensions of polypeptide chains , 1953, Proceedings of the Royal Society of London. Series B - Biological Sciences.

[19]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[20]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[21]  I. Bahar,et al.  Coarse-grained normal mode analysis in structural biology. , 2005, Current opinion in structural biology.

[22]  Hans-Peter Kriegel,et al.  Visual Mining of Cluster Hierarchies , 2003 .

[23]  M. Levitt,et al.  The complexity and accuracy of discrete state models of protein structure. , 1995, Journal of molecular biology.

[24]  H. Akaike A new look at the statistical model identification , 1974 .

[25]  Kengo Kinoshita,et al.  PiSite: a database of protein interaction sites using multiple binding states in the PDB , 2008, Nucleic Acids Res..

[26]  G. Vriend,et al.  Prediction of protein conformational freedom from distance constraints , 1997, Proteins.

[27]  Pierre Tufféry,et al.  A fast method for large‐scale De Novo peptide and miniprotein structure prediction , 2009, J. Comput. Chem..

[28]  Philippe Derreumaux,et al.  Dependency between consecutive local conformations helps assemble protein structures from secondary structures using Go potential and greedy algorithm , 2005, Proteins.

[29]  Anne-Claude Camproux,et al.  Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes , 2008, Proteins.

[30]  References , 1971 .

[31]  Shankar Subramaniam,et al.  Protein fragment clustering and canonical local shapes , 2003, Proteins.

[32]  William R Taylor,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS De , 2022 .

[33]  D. Theobald short communications Acta Crystallographica Section A Foundations of , 2005 .

[34]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[35]  Alessandro Pandini,et al.  MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database , 2007, Bioinform..

[36]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[37]  A Maritan,et al.  Recurrent oligomers in proteins: An optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies , 2000, Proteins.

[38]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[39]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[40]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[41]  M. Levitt,et al.  Protein decoy assembly using short fragments under geometric constraints , 2003, Biopolymers.

[42]  Uwe Ligges,et al.  Scatterplot3d - an R package for visualizing multivariate data , 2003 .

[43]  Gianluca Pollastri,et al.  Structural alphabets for protein structure classification: a comparison study. , 2009, Journal of molecular biology.

[44]  B. Hess Convergence of sampling in protein simulations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[45]  J F Boisvieux,et al.  Hidden Markov model approach for identifying the modular framework of the protein backbone. , 1999, Protein engineering.

[46]  Pierre Tufféry,et al.  SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace , 2006, Nucleic Acids Res..

[47]  M. Tyagi,et al.  Local Protein Structures , 2007 .

[48]  Pierre Tufféry,et al.  PEP-FOLD: an online resource for de novo peptide structure prediction , 2009, Nucleic Acids Res..

[49]  Benjamin A Hall,et al.  Dynamite: a simple way to gain insight into protein motions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[50]  I. Bahar,et al.  Gaussian Dynamics of Folded Proteins , 1997 .

[51]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.

[52]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[53]  G. Kitagawa,et al.  Information Criteria and Statistical Modeling , 2007 .

[54]  Bert L de Groot,et al.  Geometry-based sampling of conformational transitions in proteins. , 2007, Structure.

[55]  R. Friesner,et al.  Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides† , 2001 .

[56]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[57]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[58]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[59]  Jinn-Moon Yang,et al.  Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database , 2007, Genome Biology.

[60]  Alexandre G. de Brevern,et al.  Use of a structural alphabet for analysis of short loops connecting repetitive structures , 2004, BMC Bioinformatics.

[61]  F E Cohen,et al.  Conformational attractors on the Ramachandran map. , 1999, Acta crystallographica. Section D, Biological crystallography.

[62]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[63]  G. Phillips,et al.  Dynamics of proteins in crystals: comparison of experiment with simple models. , 2002, Biophysical journal.

[64]  A C Camproux,et al.  A hidden markov model derived structural alphabet for proteins. , 2004, Journal of molecular biology.

[65]  Ariel Fernández,et al.  Extent of hydrogen-bond protection in folded proteins: a constraint on packing architectures. , 2002, Biophysical journal.

[66]  BMC Bioinformatics , 2005 .