Multiple mapping method: A novel approach to the sequence‐to‐structure alignment problem in comparative protein structure modeling

A major bottleneck in comparative protein structure modeling is the quality of input alignment between the target sequence and the template structure. A number of alignment methods are available, but none of these techniques produce consistently good solutions for all cases. Alignments produced by alternative methods may be superior in certain segments but inferior in others when compared to each other; therefore, an accurate solution often requires an optimal combination of them. To address this problem, we have developed a new approach, Multiple Mapping Method (MMM). The algorithm first identifies the alternatively aligned regions from a set of input alignments. These alternatively aligned segments are scored using a composite scoring function, which determines their fitness within the structural environment of the template. The best scoring regions from a set of alternative segments are combined with the core part of the alignments to produce the final MMM alignment. The algorithm was tested on a dataset of 1400 protein pairs using 11 combinations of two to four alignment methods. In all cases MMM showed statistically significant improvement by reducing alignment errors in the range of 3 to 17%. MMM also compared favorably over two alignment meta‐servers. The algorithm is computationally efficient; therefore, it is a suitable tool for genome scale modeling studies. Proteins 2006. © 2006 Wiley‐Liss, Inc.

[1]  M. Vingron,et al.  Quantifying the local reliability of a sequence alignment. , 1996, Protein engineering.

[2]  F E Cohen,et al.  Pairwise sequence alignment below the twilight zone. , 2001, Journal of molecular biology.

[3]  András Fiser,et al.  Letter to the Editor: 1H, 13C, 15N resonance assignments and fold verification of a circular permuted variant of the potent HIV-inactivating protein cyanovirin-N* , 2001, Journal of biomolecular NMR.

[4]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[5]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[6]  Narayanan Eswar,et al.  Structure of the 80S Ribosome from Saccharomyces cerevisiae—tRNA-Ribosome and Subunit-Subunit Interactions , 2001, Cell.

[7]  Alfonso Valencia,et al.  Predicting reliable regions in protein alignments from sequence profiles. , 2003, Journal of molecular biology.

[8]  A. Sali,et al.  Alignment of protein sequences by their profiles , 2004, Protein science : a publication of the Protein Society.

[9]  C. Lambert,et al.  ESyPred 3 D : Prediction of proteins 3 D structures , 2002 .

[10]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[11]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[12]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[13]  D Eisenberg,et al.  A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. , 1997, Journal of molecular biology.

[14]  C Carter,et al.  Escherichia coli cytidine deaminase provides a molecular model for ApoB RNA editing and a mechanism for RNA substrate recognition. , 1998, Journal of molecular biology.

[15]  Philip E. Bourne,et al.  The distribution and query systems of the RCSB Protein Data Bank , 2004, Nucleic Acids Res..

[16]  Burkhard Rost,et al.  Improving fold recognition without folds. , 2004, Journal of molecular biology.

[17]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[18]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[19]  Alexei V. Finkelstein,et al.  A search for the most stable folds of protein chains , 1991, Nature.

[20]  Christophe G. Lambert,et al.  ESyPred3D: Prediction of proteins 3D structures , 2002, Bioinform..

[21]  A. Sali,et al.  Modeller: generation and refinement of homology-based protein structure models. , 2003, Methods in enzymology.

[22]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[23]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[24]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[25]  Richard L. Stevens,et al.  Packaging of Proteases and Proteoglycans in the Granules of Mast Cells and Other Hematopoietic Cells , 1995, The Journal of Biological Chemistry.

[26]  G. Harauz,et al.  Three-dimensional structure of rat surfactant protein A trimers in association with phospholipid monolayers. , 2000, Biochemistry.

[27]  Roland L Dunbrack,et al.  Scoring profile‐to‐profile sequence alignments , 2004, Protein science : a publication of the Protein Society.

[28]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[29]  Narayanan Eswar,et al.  High-throughput computational and experimental techniques in structural genomics. , 2004, Genome research.

[30]  A. Sali,et al.  Comparative protein structure modeling by iterative alignment, model building and model assessment. , 2003, Nucleic acids research.

[31]  A. Fiser Protein structure modeling in the proteomics era , 2004, Expert review of proteomics.

[32]  András Fiser,et al.  Cell wall branches, penicillin resistance and the secrets of the MurM protein. , 2003, Trends in microbiology.

[33]  Sandor Vajda,et al.  Consensus alignment server for reliable comparative modeling with distant templates , 2004, Nucleic Acids Res..

[34]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[35]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[36]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[37]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[38]  Paul W. Fitzjohn,et al.  In silico protein recombination: enhancing template and sequence alignment selection for comparative protein modelling. , 2003, Journal of molecular biology.

[39]  A. Fiser,et al.  Convergent evolution of Trichomonas vaginalis lactate dehydrogenase from malate dehydrogenase. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[40]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[41]  C Sander,et al.  On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Sandor Vajda,et al.  Consensus alignment for reliable framework prediction in homology modeling , 2003, Bioinform..

[43]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[44]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[45]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[46]  B. Chait,et al.  Immunoglobulin motif DNA recognition and heterodimerization of the PEBP2/CBF Runt domain , 2000, Nature Structural Biology.

[47]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[48]  Bruno Contreras-Moreira,et al.  Novel use of a genetic algorithm for protein structure prediction: Searching template and sequence alignment space , 2003, Proteins.

[49]  P. Argos,et al.  Determination of reliable regions in protein sequence alignments. , 1990, Protein engineering.

[50]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[51]  Kun-Mao Chao,et al.  Locating well-conserved regions within a pairwise alignment , 1993, Comput. Appl. Biosci..

[52]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.