Analyzing the sequence-structure relationship of a library of local structural prototypes.

We present a thorough analysis of the relation between amino acid sequence and local three-dimensional structure in proteins. A library of overlapping local structural prototypes was built using an unsupervised clustering approach called "hybrid protein model" (HPM). The HPM carries out a multiple structural alignment of local folds from a non-redundant protein structure databank encoded into a structural alphabet composed of 16 protein blocks (PBs). Following previous research focusing on the HPM protocol, we have considered gaps in the local structure prototype. This methodology allows to have variable length fragments. Hence, 120 local structure prototypes were obtained. Twenty-five percent of the protein fragments learnt by HPM had gaps. An investigation of tight turns suggested that they are mainly derived from three PB series with precise locations in the HPM. The amino acid information content of the whole conformational classes was tackled by multivariate methods, e.g., canonical correlation analysis. It points out the presence of seven amino acid equivalence classes showing high propensities for preferential local structures. In the same way, definition of "contrast factors" based on sequence-structure properties underline the specificity of certain structural prototypes, e.g., the dependence of Gly or Asn-rich turns to a limited number of PBs, or, the opposition between Pro-rich coils to those enriched in Ser, Thr, Asn and Glu. These results are so useful to analyze the sequence-structure relationships, but could also be used to improve fragment-based method for protein structure prediction from sequence.

[1]  C. Lim,et al.  Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites , 2007, BMC Bioinformatics.

[2]  N. Srinivasan,et al.  A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications , 2006, Proteins.

[3]  S. Hazout,et al.  Compacting local protein folds with a “hybrid protein model” , 2001 .

[4]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[5]  K C Chou,et al.  Prediction of tight turns and their types in proteins. , 2000, Analytical biochemistry.

[6]  A. G. Brevern,et al.  A structural model of a seven-transmembrane helix receptor: the Duffy antigen/receptor for chemokine (DARC). , 2005, Biochimica et biophysica acta.

[7]  K. Karplus,et al.  Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry , 2003, Proteins.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  Cristina Benros,et al.  Assessing a novel approach for predicting local 3D protein structures from sequence , 2005, Proteins.

[10]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[11]  Jun Wang,et al.  A computational approach to simplifying the protein folding alphabet , 1999, Nature Structural Biology.

[12]  Xiaolong Wang,et al.  Methods for optimizing the structure alphabet sequences of proteins , 2007, Comput. Biol. Medicine.

[13]  Nick V Grishin,et al.  Combining evolutionary and structural information for local protein structure prediction , 2004, Proteins.

[14]  M Tyagi,et al.  Protein structure mining using a structural alphabet , 2008, Proteins.

[15]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[16]  G. Rose,et al.  Turns in peptides and proteins. , 1985, Advances in protein chemistry.

[17]  A. G. Brevern,et al.  'Hybrid Protein Model' for optimally defining 3D protein structure fragments , 2003, Bioinform..

[18]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[19]  M. Levitt,et al.  Small libraries of protein fragments model native protein structures accurately. , 2002, Journal of molecular biology.

[20]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[23]  Tamotsu Noguchi,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 , 2003, Nucleic Acids Res..

[24]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[25]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[26]  Serge A. Hazout,et al.  Hybrid Protein Model (HPM): a method to compact protein 3D-structure information and physicochemical properties , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[27]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[28]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[29]  Cristian Robert Munteanu,et al.  Natural/random protein classification models based on star network topological indices , 2008, Journal of Theoretical Biology.

[30]  C. Etchebest,et al.  A structural alphabet for local protein structures: Improved prediction methods , 2005, Proteins.

[31]  A. Lombardi,et al.  Discovering protein secondary structures: classification and description of isolated alpha-turns. , 1996, Biopolymers.

[32]  Alexandre G. de Brevern,et al.  New assessment of a structural alphabet , 2005, Silico Biol..

[33]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[34]  Ruth Nussinov,et al.  Protein structure prediction via combinatorial assembly of sub-structural units , 2003, ISMB.

[35]  H. Valadié,et al.  Extension of a local backbone description using a structural alphabet: A new approach to the sequence‐structure relationship , 2002, Protein science : a publication of the Protein Society.

[36]  Jonathan Casper,et al.  Combining local‐structure, fold‐recognition, and new fold methods for protein structure prediction , 2003, Proteins.

[37]  Aurélie Bornot,et al.  Protein beta-turn assignments , 2006, Bioinformation.

[38]  Serge A. Hazout,et al.  Hybrid protein model (HPM): a method for building a library of overlapping local structural prototypes. Sensitivity study and improvements of the training , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[39]  A M Lesk,et al.  Folding units in globular proteins. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[40]  A. Alix,et al.  High accuracy prediction of β‐turns and their types using propensities and multiple alignments , 2005 .

[41]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[42]  Richard Bonneau,et al.  De novo prediction of three-dimensional structures for major protein families. , 2002, Journal of molecular biology.

[43]  J Schuchhardt,et al.  Local structural motifs of protein backbones are classified by self-organizing neural networks. , 1996, Protein engineering.

[44]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[45]  Ronald M Levy,et al.  Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update. , 2003, Protein engineering.

[46]  D Xu,et al.  Application of PROSPECT in CASP4: Characterizing protein structures with new folds , 2001, Proteins.

[47]  Serge A. Hazout,et al.  Genome Compartimentation by a Hybrid Chromosome Model (HM). Application to Saccharomyces Cerevisae Subtelomeres , 2002, Comput. Chem..

[48]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[49]  R. Levy,et al.  Simplified amino acid alphabets for protein fold recognition and implications for folding. , 2000, Protein engineering.

[50]  Thomas Lengauer,et al.  BMC Bioinformatics Methodology article Local protein structure prediction using discriminative models , 2006 .

[51]  A. G. Brevern,et al.  A reduced amino acid alphabet for understanding and designing protein adaptation to mutation , 2007, European Biophysics Journal.

[52]  M. Tyagi,et al.  Local Protein Structures , 2007 .

[53]  F. Cohen,et al.  Taxonomy and conformational analysis of loops in proteins. , 1992, Journal of molecular biology.

[54]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Serge A. Hazout,et al.  Local backbone structure prediction of proteins , 2004, Silico Biol..

[56]  Alexandre G. de Brevern,et al.  Use of a structural alphabet for analysis of short loops connecting repetitive structures , 2004, BMC Bioinformatics.

[57]  A Maritan,et al.  Recurrent oligomers in proteins: An optimal scheme reconciling accurate and concise backbone representations in automated folding and design studies , 2000, Proteins.

[58]  A. Lombardi,et al.  Discovering protein secondary structures: Classification and description of isolated α‐turns , 1996 .

[59]  Narayanaswamy Srinivasan,et al.  Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet , 2006, Nucleic Acids Res..

[60]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[61]  A. G. Brevern,et al.  “Pinning strategy”: a novel approach for predicting the backbone structure in terms of protein blocks from sequence , 2007, Journal of Biosciences.

[62]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[63]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[64]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.