Amino acid composition and hydrophobicity patterns of protein domains correlate with their structures

We examine the correlation between the sequence and tertiary structure for 212 domains from globular proteins and polypeptides. The sequence of each domain is described as a set of 25 features: the mole percent of 20 amino acids, the number of residues in the domain, and the abundance of four simple patterns in the hydrophobicity profile of the sequence. Each domain, then, is described as a location in 25‐dimensional sequence‐feature space. We use pattern‐recognition methods to find the two axes through the 25‐dimensional sequence‐feature space that best discriminate, respectively, predominantly α‐helix domains from predominantly β‐strand domains (the “secondary structure vector,” SV) and parallel α/β domains from other domains (the “parallel vector,” PV). When we divide the domains into two categories based on whether the cysteine content is above (CYS‐RICH) or below (NORMAL) 4.5%, we find the secondary structure vector for the subset of CYS‐RICH domains points in a significantly different direction than the equivalent vector for the NORMAL domains. Thus, CYS‐RICH and NORMAL, domains are best treated separately. The secondary structure vector and the parallel vector for NORMAL domains describes statistically meaningful information, but the secondary structure vector for CYS‐RICH domains may not be as reliable. We show how the secondary structure content of a NORMAL domain can be predicted by projecting the domain in the feature space onto the secondary structure vector. We subdivide the domains into five structural classes based on whether there is a parallel or mixed β‐sheet in the domain and whether there are more helix or strand residues: NORMAL ALPHA, NORMAL BETA, NORMAL PARALLEL, CYS‐RICH ALPHA, and CYS‐RICH BETA. When we project the NORMAL domains onto the plane containing the origin of the feature space and SV and PV, we see that ALPHA, BETA, and PARALLEL, domains cluster in the plane, with the BETA cluster partially overlapping the PARALLEL cluster. The separations between the clusters are such that, by looking at the location of any given NORMAL domain in the plane, we can correctly predict its structural class with 83% accuracy. CYS‐RICH ALPHA and BETA domains cluster when projected onto the CYS‐RICH SV vector, and the classes can be preducted with 83% accuracy, but this accuracy for CYS‐RICH domains may not be statistically meaningful.

[1]  C. Chothia Principles that determine the structure of proteins. , 1984, Annual review of biochemistry.

[2]  R. Staden,et al.  Protein disk of tobacco mosaic virus at 2.8 Å resolution showing the interactions within and between subunits , 1978, Nature.

[3]  A. Walton Polypeptides and protein structure , 1981 .

[4]  Graeme Wistow,et al.  The molecular structure and stability of the eye lens: X-ray analysis of γ-crystallin II , 1981, Nature.

[5]  Shoshana J. Wodak,et al.  Location of structural domains in proteins , 1981 .

[6]  K. B. Ward,et al.  The 2.5 A crystal structure of a dimeric phospholipase A2 from the venom of Crotalus atrox. , 1981, The Journal of biological chemistry.

[7]  P. A. Peterson,et al.  The three‐dimensional structure of retinol‐binding protein. , 1984, The EMBO journal.

[8]  Kurt Varmuza,et al.  Pattern recognition in chemistry , 1980 .

[9]  B. Sjöberg,et al.  Conformational and functional similarities between glutaredoxin and thioredoxins. , 1984, The EMBO journal.

[10]  F. Jurnak,et al.  Structure at 2.3 A resolution of the gene 5 product of bacteriophage fd: a DNA unwinding protein. , 1979, Journal of molecular biology.

[11]  B. Matthews,et al.  Goose lysozyme structure: an evolutionary link between hen and bacteriophage lysozymes? , 1983, Nature.

[12]  R J Fletterick,et al.  Secondary structure assignment for alpha/beta proteins by a combinatorial approach. , 1983, Biochemistry.

[13]  R M Sweet,et al.  Crystal structure of the complex of porcine trypsin with soybean trypsin inhibitor (Kunitz) at 2.6-A resolution. , 1974, Biochemistry.

[14]  T. B. Powers,et al.  Structure Analysis of a Ferricytochrome c from the Cyanobacterium, Anacystis nidulans , 1982 .

[15]  C. Hansch,et al.  Crystal structure of avian dihydrofolate reductase containing phenyltriazine and NADPH. , 1982, The Journal of biological chemistry.

[16]  J. N. Varghese,et al.  Structure of the influenza virus glycoprotein antigen neuraminidase at 2.9 Å resolution , 1983, Nature.

[17]  K. Nishikawa,et al.  Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution. , 1983, Journal of biochemistry.

[18]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[19]  P. Bourne,et al.  Helix packing and subunit conformation in horse spleen apoferritin , 1980, Nature.

[20]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[21]  B. Clark,et al.  Structural features of the GDP binding site of elongation factor Tu from Escherichia coli as determined by X‐ray diffraction , 1981, FEBS letters.

[22]  F. C. Hartman,et al.  Crystallization of yeast triose phosphate isomerase from polyethylene glycol. Protein crystal formation following phase separation. , 1981, The Journal of biological chemistry.

[23]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[24]  W. Kabsch,et al.  Three‐dimensional structure of bovine pancreatic DNase I at 2.5 A resolution. , 1984, The EMBO journal.

[25]  J. Frère,et al.  Structure of a Zn2+-containing D-alanyl-D-alanine-cleaving carboxypeptidase at 2.5 Å resolution , 1982, Nature.

[26]  M. Levitt,et al.  Automatic identification of secondary structure in globular proteins. , 1977, Journal of molecular biology.

[27]  F. Salemme,et al.  Structure of cytochrome c555 of Chlorobium thiosulfatophilum: primitive low-potential cytochrome c. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[28]  I. Wilson,et al.  Structure of the haemagglutinin membrane glycoprotein of influenza virus at 3 Å resolution , 1981, Nature.

[29]  P. Evans,et al.  Structure and control of phosphofructokinase from Bacillus stearothermophilus , 1979, Nature.

[30]  M. Pierrot,et al.  Structure and sequence of the multihaem cytochrome c3 , 1979, Nature.

[31]  M. Matsushima,et al.  The Three-dimensional Structure of Plasminostreptin, a Bacterial Protein Protease Inhibitor, at 2.8Å Resolution , 1984 .

[32]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[33]  T. B. Powers,et al.  Iron superoxide dismutase from Escherichia coli at 3.1-A resolution: a structure unlike that of copper/zinc protein at both monomer and dimer levels. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[34]  T. Steitz,et al.  Structure of catabolite gene activator protein at 2.9-A resolution. Incorporation of amino acid sequence and interactions with cyclic AMP. , 1982, The Journal of biological chemistry.

[35]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[36]  W R Taylor,et al.  Recognition of super-secondary structure in proteins. , 1984, Journal of molecular biology.

[37]  K. Moffat,et al.  Structure of vitamin D-dependent calcium-binding protein from bovine intestine , 1981, Nature.

[38]  L. Sieker,et al.  Structure of the oxidized form of a flavodoxin at 2.5-Angstrom resolution: resolution of the phase ambiguity by anomalous scattering. , 1972, Proceedings of the National Academy of Sciences of the United States of America.

[39]  W. Hol,et al.  Crystal structure of p-hydroxybenzoate hydroxylase. , 1979, Journal of molecular biology.

[40]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[41]  K. Nishikawa,et al.  Classification of proteins into groups based on amino acid composition and other characters. II. Grouping into four types. , 1983, Journal of biochemistry.