A new method for identification of protein (sub)families in a set of proteins based on hydropathy distribution in proteins

Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment. Proteins 2005. © 2005 Wiley‐Liss, Inc.

[1]  J. Clements,et al.  Identification of novel membrane proteins by searching for patterns in hydropathy profiles. , 2002, European journal of biochemistry.

[2]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S Subbiah,et al.  A structural explanation for the twilight zone of protein sequence homology. , 1996, Structure.

[4]  L. H. Bradley,et al.  Protein design by binary patterning of polar and nonpolar amino acids. , 1993, Methods in molecular biology.

[5]  J. Dalton,et al.  Recombinant Expression and Localization ofSchistosoma mansoni Cathepsin L1 Support Its Role in the Degradation of Host Hemoglobin , 1999, Infection and Immunity.

[6]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[7]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  C. Dobson Finding the right fold , 1995, Nature Structural Biology.

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  A. Morineau,et al.  Multivariate descriptive statistical analysis , 1984 .

[11]  D H Walker,et al.  Sequence and characterization of an Ehrlichia chaffeensis gene encoding 314 amino acids highly homologous to the NAD A enzyme. , 1997, FEMS microbiology letters.

[12]  M. Hecht,et al.  Periodicity of polar and nonpolar amino acids is the major determinant of secondary structure in self-assembling oligomeric peptides. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Dalton,et al.  Molecular modeling and substrate specificity of discrete cruzipain-like and cathepsin L-like cysteine proteinases of the human blood fluke Schistosoma mansoni. , 2000, Archives of biochemistry and biophysics.

[14]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[15]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[16]  M. Sternberg,et al.  Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. , 1997, Journal of molecular biology.

[17]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[18]  D. Slotboom,et al.  Hydropathy profile alignment: a tool to search for structural homologues of membrane proteins. , 1998, FEMS microbiology reviews.

[19]  D. Fairlie,et al.  Proteolysis of human hemoglobin by schistosome cathepsin D. , 2001, Molecular and biochemical parasitology.

[20]  D. Slotboom,et al.  Estimation of structural similarity of membrane proteins by hydropathy profile alignment. , 1998, Molecular membrane biology.

[21]  Yael Mandel-Gutfreund,et al.  On the significance of alternating patterns of polar and non-polar residues in beta-strands. , 2002, Journal of molecular biology.

[22]  U. Hobohm,et al.  A sequence property approach to searching protein databases. , 1995, Journal of molecular biology.

[23]  W. Taylor,et al.  Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.