Novel protein folds and their nonsequential structural analogs

Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three‐dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM‐align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu‐berlin.de/gplus for pairwise alignment, visualization, and database comparison.

[1]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[2]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[3]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[4]  Marcus Weber,et al.  Selection and flexible optimization of binding modes from conformation ensembles , 2008, Biosyst..

[5]  John Kuriyan,et al.  Structural analysis of a eukaryotic sliding DNA clamp–clamp loader complex , 2004, Nature.

[6]  R. L. Baldwin,et al.  Further studies of the helix dipole model: Effects of a free α‐NH3+ or α‐COO− group on helix stability , 1989 .

[7]  Ming-Jing Hwang,et al.  Alternative alignments from comparison of protein structures , 2004, Proteins.

[8]  A. Yee,et al.  Solution structure of the hypothetical novel‐fold protein TA0956 from Thermoplasma acidophilum , 2007, Proteins.

[9]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Z. X. Wang,et al.  A re-estimation for the total numbers of protein folds and superfamilies. , 1998, Protein engineering.

[11]  Andrea Musacchio,et al.  Crystal structure of the tetrameric Mad1–Mad2 core complex: implications of a ‘safety belt’ binding mechanism for the spindle checkpoint , 2002, The EMBO journal.

[12]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[13]  H Luecke,et al.  Dipoles localized at helix termini of proteins stabilize charges. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yang Zhang,et al.  TASSER-Lite: an automated tool for protein comparative modeling. , 2006, Biophysical journal.

[15]  Anton Meinhart,et al.  Recognition of RNA polymerase II carboxy-terminal domain by 3′-RNA-processing factors , 2004, Nature.

[16]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[17]  Luonan Chen,et al.  Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison , 2006, BMC Structural Biology.

[18]  R. Eckenhoff,et al.  The four-helix bundle: An attractive fold , 2005 .

[19]  Jeremy C. Smith,et al.  The α Helix Dipole: Screened Out? , 2005 .

[20]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[23]  J. Szustakowski,et al.  Protein structure alignment using a genetic algorithm , 2000, Proteins.

[24]  Xin Yuan,et al.  Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins , 2005, Bioinform..

[25]  K. Chou,et al.  Energetics of the structure of the four-a-helix bundle in proteins ( conformational energy computation / packing of helices / protein conformation / twisted structures in proteins ) , 2022 .

[26]  Juno Choe,et al.  Protein tolerance to random amino acid change. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  David A. Lee,et al.  Progress towards mapping the universe of protein folds , 2004, Genome Biology.

[28]  S Cusack,et al.  Crystal structure of the human nuclear cap binding complex. , 2001, Molecular cell.

[29]  Thomas Steinke,et al.  Connectivity independent protein-structure alignment: a hierarchical approach , 2006, BMC Bioinformatics.

[30]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[31]  Ke Fan,et al.  The number of protein folds and their distribution over families in nature , 2004, Proteins.

[32]  Edmund K. Burke,et al.  ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information , 2007, BMC Bioinformatics.

[33]  Shiro Kobayashi,et al.  Negative surface potential produced by self-assembled monolayers of helix peptides oriented vertically to a surface , 1999 .

[34]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[35]  B. Demeler,et al.  Insights into dimerization and four‐helix bundle formation found by dissection of the dimer interface of the GrpE protein from Escherichia coli , 2003, Protein science : a publication of the Protein Society.

[36]  N. Ben-Tal,et al.  Helix-helix interactions in lipid bilayers. , 1996, Biophysical journal.

[37]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[38]  D. Cooper,et al.  Human Gene Mutation Database , 1996, Human Genetics.

[39]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Robert Johnson,et al.  Structural Analysis , 2020, Multiphysics Modeling with Application to Biomedical Engineering.

[41]  Zhiping Weng,et al.  Protein Structure Alignment Using Evolutionary Computation , 2003 .

[42]  I. Arkin,et al.  Monte Carlo estimation of the number of possible protein folds: Effects of sampling bias and folds distributions , 2003, Proteins.

[43]  Alexej Abyzov,et al.  Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point , 2004, Protein science : a publication of the Protein Society.

[44]  H. Wolfson,et al.  Spatial chemical conservation of hot spot interactions in protein-protein complexes , 2007, BMC Biology.

[45]  Ryan Day,et al.  A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary , 2003, Protein science : a publication of the Protein Society.

[46]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Jadwiga Bienkowska,et al.  Crystal structure of the anthrax lethal factor , 2001, Nature.

[48]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Ruth Nussinov,et al.  MultiProt - A Multiple Protein Structural Alignment Algorithm , 2002, WABI.

[50]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[51]  Tai-Huang Huang,et al.  Solution structure of the Arabidopsis thaliana telomeric repeat-binding protein DNA binding domain: a new fold with an additional C-terminal helix. , 2006, Journal of molecular biology.

[52]  G M Maggiora,et al.  Energetics of the structure of the four-alpha-helix bundle in proteins. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[53]  D Fischer,et al.  Analysis of topological and nontopological structural similarities in the PDB: New examples with old structures , 1996, Proteins.

[54]  W. Pearson,et al.  The limits of protein sequence comparison? , 2005, Current opinion in structural biology.

[55]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[56]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[57]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.