SiteBinder: An Improved Approach for Comparing Multiple Protein Structural Motifs

There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.

[1]  Radka Svobodová Vareková,et al.  Identification of Potential Small Molecule Peptidomimetics Similar to Motifs in Proteins , 2007, J. Chem. Inf. Model..

[2]  Jakub Wenus,et al.  Mathematical modelling of the mitochondrial apoptosis pathway. , 2011, Biochimica et biophysica acta.

[3]  K. Kinoshita,et al.  Identification of protein biochemical functions by similarity search using the molecular surface database eF‐site , 2003, Protein science : a publication of the Protein Society.

[4]  Serge Pérez,et al.  Structural basis for oligosaccharide-mediated adhesion of Pseudomonas aeruginosa in the lungs of cystic fibrosis patients , 2002, Nature Structural Biology.

[5]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[6]  D B Evans,et al.  Epidermal growth factor receptor blockade with C225 plus gemcitabine results in regression of human pancreatic carcinoma growing orthotopically in nude mice by antiangiogenic mechanisms. , 2000, Clinical cancer research : an official journal of the American Association for Cancer Research.

[7]  I. Gelfand,et al.  New classification of supersecondary structures of sandwich‐like proteins uncovers strict patterns of strand assemblage , 2007, Proteins.

[8]  Yen-Jen Oyang,et al.  ProteMiner-SSM: a web server for efficient analysis of similar protein tertiary substructures , 2004, Nucleic Acids Res..

[9]  B. Honig,et al.  On the nature of cavities on protein surfaces: Application to the identification of drug‐binding sites , 2006, Proteins.

[10]  C. Orengo,et al.  From structure to function: Approaches and limitations , 2000, Nature Structural Biology.

[11]  R. Diamond A note on the rotational superposition problem , 1988 .

[12]  A Valencia,et al.  Three-dimensional view of the surface motif associated with the P-loop structure: cis and trans cases of convergent evolution. , 2000, Journal of molecular biology.

[13]  R. Abagyan,et al.  Do aligned sequences share the same fold? , 1997, Journal of molecular biology.

[14]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[15]  Arno Formella,et al.  Superimposé: a 3D structural superposition server , 2008, Nucleic Acids Res..

[16]  Philip E. Bourne,et al.  A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery , 2009, Bioinform..

[17]  Jaroslav Koča A mathematical model of realistic constitutional chemistry. A synthon approach. II: The model and organic synthesis , 1989 .

[18]  D. Green,et al.  Means to an end : apoptosis and other cell death mechanisms , 2011 .

[19]  P Argos,et al.  The primary structure of transcription factor TFIIIA has 12 consecutive repeats , 1985, FEBS letters.

[20]  J. Thomsen,et al.  The Orphan Nuclear Receptor SHP Utilizes Conserved LXXLL-Related Motifs for Interactions with Ligand-Activated Estrogen Receptors , 2000, Molecular and Cellular Biology.

[21]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[22]  Peter Willett,et al.  Searching for Patterns of Amino Acids in 3D Protein Structures , 2003, J. Chem. Inf. Comput. Sci..

[23]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[24]  Charles F. F. Karney Quaternions in molecular modeling. , 2005, Journal of molecular graphics & modelling.

[25]  Aaron Klug,et al.  In vivo repression by a site-specific DNA-binding protein designed against an oncogenic sequence , 1994, Nature.

[26]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[27]  D. Green,et al.  The BCL-2 family reunion. , 2010, Molecular cell.

[28]  R. Powers,et al.  Comparison of protein active site structures for functional annotation of proteins and drug design , 2006, Proteins.

[29]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[30]  I. Herr,et al.  Cellular stress response and apoptosis in cancer therapy. , 2001, Blood.

[31]  Jakub Pas,et al.  ELM: the status of the 2010 eukaryotic linear motif resource , 2009, Nucleic Acids Res..

[32]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[33]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[34]  V. Deretic,et al.  Microbial pathogenesis in cystic fibrosis: mucoid Pseudomonas aeruginosa and Burkholderia cepacia. , 1996, Microbiological reviews.

[35]  Gabriele Ausiello,et al.  Superpose3D: A Local Structural Comparison Program That Allows for User-Defined Structure Representations , 2010, PloS one.

[36]  E. Kellenberger,et al.  A simple and fuzzy method to align and compare druggable ligand‐binding sites , 2008, Proteins.

[37]  Christian Lemmen,et al.  Computational methods for the structural alignment of molecules , 2000, J. Comput. Aided Mol. Des..

[38]  Jan Adam,et al.  Unusual entropy-driven affinity of Chromobacterium violaceum lectin CV-IIL toward fucose and mannose. , 2006, Biochemistry.

[39]  Janet M. Thornton,et al.  An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis , 2003, Bioinform..

[40]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[41]  Gabriele Ausiello,et al.  Functional annotation by identification of local surface similarities: a novel tool for structural genomics , 2005, BMC Bioinformatics.

[42]  Nick V. Grishin,et al.  Structural classi®cation of zinc ®ngers , 2003 .

[43]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[44]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[45]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[46]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[47]  David R Liu,et al.  Binding and stability determinants of the PPARgamma nuclear receptor-coactivator interface as revealed by shotgun alanine scanning and in vivo selection. , 2006, Journal of the American Chemical Society.

[48]  Jaroslav Koča,et al.  A mathematical model of realistic constitutional chemistry. A synthon approach. I: An algebraic model of a synthon , 1989 .

[49]  Allegra Via,et al.  Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites , 2007, BMC Bioinformatics.

[50]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[51]  M. Sternberg,et al.  Convergent evolution of enzyme active sites is not a rare phenomenon. , 2007, Journal of molecular biology.

[52]  Jiawei Han,et al.  Expression of bbc3, a pro-apoptotic BH3-only gene, is regulated by diverse cell death and survival signals , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Lydia E. Kavraki,et al.  Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs , 2004, Pacific Symposium on Biocomputing.

[54]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[55]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[56]  Peter E Wright,et al.  Solution structure of the N-terminal zinc fingers of the Xenopus laevis double-stranded RNA-binding protein ZFa. , 2005, Journal of molecular biology.

[57]  J. Snoeyink,et al.  Defining and Computing Optimum RMSD for Gapped and Weighted Multiple-Structure Alignment , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[58]  D. Brutlag,et al.  FoldMiner: Structural motif discovery using an improved superposition algorithm , 2004, Protein science : a publication of the Protein Society.

[59]  David R. Gilbert,et al.  TOPS: an enhanced database of protein structural topology , 2004, Nucleic Acids Res..

[60]  M J Sippl,et al.  Optimum superimposition of protein structures: ambiguities and implications. , 1996, Folding & design.

[61]  J. Leers,et al.  Mechanistic Principles in NR Box-Dependent Interaction between Nuclear Hormone Receptors and the Coactivator TIF2 , 1998, Molecular and Cellular Biology.

[62]  David W. Andrews,et al.  Embedded together: The life and death consequences of interaction of the Bcl-2 family with membranes , 2007, Apoptosis.

[63]  T. Taniguchi,et al.  BH3‐only proteins: Integrated control point of apoptosis , 2006, International journal of cancer.

[64]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[65]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[66]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[67]  Lydia E. Kavraki,et al.  The LabelHash algorithm for substructure matching , 2010, BMC Bioinformatics.

[68]  Didier Rognan,et al.  How to Measure the Similarity Between Protein Ligand-Binding Sites? , 2008 .

[69]  Jaroslav Koča,et al.  A mathematical model of the logical structure of chemistry. A bridge between theoretical and experimental chemistry and a general tool for computer-assisted molecular design , 1991 .

[70]  P. Walter,et al.  Signal integration in the endoplasmic reticulum unfolded protein response , 2007, Nature Reviews Molecular Cell Biology.

[71]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[72]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[73]  William R. Taylor,et al.  Protein bioinformatics - an algorithmic approach to sequence and structure analysis , 2004 .

[74]  David A. Cosgrove,et al.  A novel method of aligning molecules by local surface shape similarity , 2000, J. Comput. Aided Mol. Des..

[75]  Kengo Kinoshita,et al.  Protein informatics towards function identification. , 2003, Current opinion in structural biology.

[76]  Daniel Baum Multiple Semi-flexible 3D Superposition of Drug-Sized Molecules , 2005, CompLife.

[77]  C. Pabo,et al.  Design and selection of novel Cys2His2 zinc finger proteins. , 2001, Annual review of biochemistry.

[78]  C. Lemmen,et al.  FLEXS: a method for fast flexible ligand superposition. , 1998, Journal of medicinal chemistry.

[79]  M. Helmer-Citterich,et al.  Structure-based function prediction: approaches and applications. , 2008, Briefings in functional genomics & proteomics.

[80]  Michaela Wimmerová,et al.  Structural basis for mannose recognition by a lectin from opportunistic bacteria Burkholderia cenocepacia. , 2008, The Biochemical journal.

[81]  K. Dill,et al.  Using quaternions to calculate RMSD , 2004, J. Comput. Chem..

[82]  Frances M. G. Pearl,et al.  Recognizing the fold of a protein structure , 2003, Bioinform..

[83]  S. Kearsley On the orthogonal transformation used for structural comparisons , 1989 .