Relation between sequence similarity and structural similarity in proteins. Role of important properties of amino acids

In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (≥15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin α chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.

[1]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[2]  Georg E. Schulz,et al.  Principles of Protein Structure , 1979 .

[3]  M GayDavid,et al.  Algorithm 611: Subroutines for Unconstrained Minimization Using a Model/Trust-Region Approach , 1983 .

[4]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[5]  O. Ptitsyn Invariant features of globin primary structure and coding of their secondary structure , 1974 .

[6]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[7]  D. Goeddel,et al.  DNA sequence of two closely linked human leukocyte interferon genes. , 1981, Science.

[8]  V. Bryson,et al.  Evolving Genes and Proteins. , 1965, Science.

[9]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  M J Sippl,et al.  On the problem of comparing protein structures. Development and applications of a new method for the assessment of structural similarities of polypeptide conformations. , 1982, Journal of molecular biology.

[12]  E. Thompson,et al.  Amino acid sequences of globin chains and their use in phylogenetic divergence point estimations. , 1979, UCLA forum in medical sciences.

[13]  J. C. Kendrew,et al.  Structure and function of haemoglobin: II. Some relations between polypeptide chain configuration and amino acid sequence , 1965 .

[14]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[15]  M. Goodman,et al.  Decoding the pattern of protein evolution. , 1981, Progress in biophysics and molecular biology.

[16]  T. Taniguchi,et al.  Structure of a chromosomal gene for human interferon beta. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[17]  R. Dickerson The cytochromes c: an exercise in scientific serendipity. , 1979, UCLA forum in medical sciences.

[18]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[19]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[20]  H. Scheraga,et al.  Statistical analysis of the physical properties of the 20 naturally occurring amino acids , 1985 .

[21]  CHARLES J. EPSTEIN,et al.  Non-randomness of Ammo-acid Changes in the Evolution of Homologous Proteins , 1967, Nature.

[22]  S. Pestka The human interferons--from protein purification and sequence to cloning and expression in bacteria: before, between, and beyond. , 1983, Archives of biochemistry and biophysics.

[23]  H. Scheraga,et al.  Energy parameters in polypeptides. 9. Updating of geometrical parameters, nonbonded interactions, and hydrogen bond interactions for the naturally occurring amino acids , 1983 .

[24]  S. Miyazawa,et al.  Relationship between mutability, polarity and exteriority of amino acid residues in protein evolution. , 2009, International journal of peptide and protein research.

[25]  G. W. Snedecor Statistical Methods , 1964 .

[26]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[27]  David Eisenberg,et al.  The helical hydrophobic moment: a measure of the amphiphilicity of a helix , 1982, Nature.

[28]  S Roy,et al.  Hydrophobic basis of packing in globular proteins. , 1980, Proceedings of the National Academy of Sciences of the United States of America.