Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins

MOTIVATION Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence and compactness. RESULTS The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the three-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology, but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.

[1]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[2]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[3]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[4]  M. Milik,et al.  Common Structural Cliques: a tool for protein structure and function analysis. , 2003, Protein engineering.

[5]  B. Honig Protein folding: from the levinthal paradox to structure prediction. , 1999, Journal of molecular biology.

[6]  Gary L Gilliland,et al.  Crystal structure of the Escherichia coli YjiA protein suggests a GTP‐dependent regulatory function , 2004, Proteins.

[7]  E. Casale,et al.  Dimer formation through domain swapping in the crystal structure of the Grb2-SH2-Ac-pYVNV complex. , 2000, Biochemistry.

[8]  C. Orengo Classification of protein folds , 1994 .

[9]  R A Sayle,et al.  RASMOL: biomolecular graphics for all. , 1995, Trends in biochemical sciences.

[10]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[11]  Albert Jeltsch,et al.  Circular Permutations in the Molecular Evolution of DNA Methyltransferases , 1999, Journal of Molecular Evolution.

[12]  Zhiping Weng,et al.  Protein Structure Alignment Using Evolutionary Computation , 2003 .

[13]  D. Eisenberg,et al.  Domain swapping: entangling alliances between proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  B Honig,et al.  Sequence to structure alignment in comparative modeling using PrISM , 1999, Proteins.

[15]  R. Abagyan,et al.  An automatic search for similar spatial arrangements of alpha-helices and beta-strands in globular proteins. , 1989, Journal of biomolecular structure & dynamics.

[16]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[17]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[18]  Mong-Li Lee,et al.  Efficient remote homology detection using local structure , 2003, Bioinform..

[19]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005, Proteins.

[20]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[21]  J. Jung,et al.  Circularly permuted proteins in the protein structure database , 2001, Protein science : a publication of the Protein Society.

[22]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[23]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[24]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[25]  R. Sokal,et al.  Introduction to biostatistics , 1974 .

[26]  D Fischer,et al.  Analysis of topological and nontopological structural similarities in the PDB: New examples with old structures , 1996, Proteins.

[27]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[28]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[29]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[30]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[31]  Our Molecular Biology Correspondent,et al.  Protein Structure: Similarities between Distinct Species , 1968, Nature.

[32]  H. Bernstein Recent changes to RasMol, recombining the variants. , 2000, Trends in biochemical sciences.

[33]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[34]  J. Cavanaugh Biostatistics , 2005, Definitions.

[35]  R. Blumenthal,et al.  Structure of pvu II DNA-(cytosine N4) methyltransferase, an example of domain permutation and protein fold assignment. , 1997, Nucleic acids research.

[36]  Tsutomu Nakamura,et al.  Systematic circular permutation of an entire protein reveals essential folding elements , 2000, Nature Structural Biology.

[37]  L Serrano,et al.  The order of secondary structure elements does not determine the structure of a protein but does affect its folding kinetics. , 1995, Journal of molecular biology.

[38]  J. Szustakowski,et al.  Protein structure alignment using a genetic algorithm , 2000, Proteins.

[39]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[40]  A V Efimov Structural similarity between two-layer alpha/beta and beta-proteins. , 1995, Journal of molecular biology.

[41]  Patrick Aloy,et al.  Predictions without templates: New folds, secondary structure, and contacts in CASP5 , 2003, Proteins.

[42]  T. P. Flores,et al.  Protein structural topology: Automated analysis and diagrammatic representation , 2008, Protein science : a publication of the Protein Society.

[43]  Zbigniew Grzonka,et al.  Human cystatin C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping , 2001, Nature Structural Biology.

[44]  C. Matthews,et al.  Testing the role of chain connectivity on the stability and structure of dihydrofolate reductase from E. coli: Fragment complementation and circular permutation reveal stable, alternatively folded forms , 2001, Protein science : a publication of the Protein Society.

[45]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[46]  Christopher Bystroff,et al.  Predicting interresidue contacts using templates and pathways , 2003, Proteins.

[47]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[48]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[49]  Alexander V. Efimov Structural Similarity between Two-layer α/β and β-Proteins , 1995 .

[50]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.