Tools for integrated sequence-structure analysis with UCSF Chimera

BackgroundComparing related structures and viewing the structures in the context of sequence alignments are important tasks in protein structure-function research. While many programs exist for individual aspects of such work, there is a need for interactive visualization tools that: (a) provide a deep integration of sequence and structure, far beyond mapping where a sequence region falls in the structure and vice versa; (b) facilitate changing data of one type based on the other (for example, using only sequence-conserved residues to match structures, or adjusting a sequence alignment based on spatial fit); (c) can be used with a researcher's own data, including arbitrary sequence alignments and annotations, closely or distantly related sets of proteins, etc.; and (d) interoperate with each other and with a full complement of molecular graphics features. We describe enhancements to UCSF Chimera to achieve these goals.ResultsThe molecular graphics program UCSF Chimera includes a suite of tools for interactive analyses of sequences and structures. Structures automatically associate with sequences in imported alignments, allowing many kinds of crosstalk. A novel method is provided to superimpose structures in the absence of a pre-existing sequence alignment. The method uses both sequence and secondary structure, and can match even structures with very low sequence identity. Another tool constructs structure-based sequence alignments from superpositions of two or more proteins. Chimera is designed to be extensible, and mechanisms for incorporating user-specific data without Chimera code development are also provided.ConclusionThe tools described here apply to many problems involving comparison and analysis of protein structures and their sequences. Chimera includes complete documentation and is intended for use by a wide range of scientists, not just those in the computational disciplines. UCSF Chimera is free for non-commercial use and is available for Microsoft Windows, Apple Mac OS X, Linux, and other platforms from http://www.cgl.ucsf.edu/chimera.

[1]  Kenji Mizuguchi,et al.  HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database , 2004, Nucleic Acids Res..

[2]  Zaida Luthey-Schulten,et al.  Multiple Alignment of protein structures and sequences for VMD , 2006, Bioinform..

[3]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[4]  Gilles Labesse,et al.  ViTO: tool for refinement of protein sequence-structure alignments , 2004, Bioinform..

[5]  G. H. Reed,et al.  The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids. , 1996, Biochemistry.

[6]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[7]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[8]  Christoph Gille,et al.  KISS for STRAP: user extensions for a protein alignment editor , 2003, Bioinform..

[9]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[10]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  Marc A. Martí-Renom,et al.  ModView, visualization of multiple protein sequences and structures , 2003, Bioinform..

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  Arnaldo J. Montagner,et al.  STING Millennium Suite: integrated software for extensive analyses of 3d structures of proteins and their complexes , 2004, BMC Bioinformatics.

[16]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[17]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[18]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[19]  H. Wolfson,et al.  Optimization of multiple‐sequence alignment based on multiple‐structure alignment , 2005, Proteins.

[20]  Valentin A. Ilyin,et al.  Friend, an integrated analytical front-end application for bioinformatics , 2005, Bioinform..

[21]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[22]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[24]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[25]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[26]  Akinori Sarai,et al.  The Diamond STING server , 2005, Nucleic Acids Res..

[27]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[28]  A Elofsson,et al.  Assessing the performance of fold recognition methods by means of a comprehensive benchmark. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[29]  Alexej Abyzov,et al.  Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point , 2004, Protein science : a publication of the Protein Society.

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  S H Bryant,et al.  Cn3D: sequence and structure views for Entrez. , 2000, Trends in biochemical sciences.