Building multiple sequence alignments with a flavor of HSSP alignments.

Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP.

[1]  Larry Wall,et al.  Programming Perl (2nd ed.) , 1996 .

[2]  Larry Wall,et al.  Programming Perl - covers Perl 5, 2nd Edition , 1996, A nutshell handbook.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Arnaldo J. Montagner,et al.  STING Report: convenient web-based application for graphic and tabular presentations of protein sequence, structure and function descriptors from the STING database , 2005, Nucleic Acids Res..

[5]  Akinori Sarai,et al.  The Diamond STING server , 2005, Nucleic Acids Res..

[6]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[7]  H. Rüterjans,et al.  Solution structure of the functional domain of Paracoccus denitrificans cytochrome c552 in the reduced state. , 2000, European journal of biochemistry.

[8]  Arnaldo J. Montagner,et al.  JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure , 2004, Nucleic Acids Res..

[9]  P H Patel,et al.  DNA polymerase active site is highly mutable: evolutionary consequences. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  T. Steitz,et al.  Structure of DNA polymerase I Klenow fragment bound to duplex DNA , 1993, Science.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[14]  Alexandre Alvaro,et al.  STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence , 2003, Nucleic Acids Res..

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Goran Neshich,et al.  Structural basis of the lisinopril-binding specificity in N- and C-domains of human somatic ACE. , 2003, Biochemical and biophysical research communications.

[17]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.