Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families

The increasing volume of genomic data opens new possibilities for analysis of protein function. We introduce a method for automated selection of residues that determine the functional specificity of proteins with a common general function (the specificity‐determining positions [SDP] prediction method). Such residues are assumed to be conserved within groups of orthologs (that may be assumed to have the same specificity) and to vary between paralogs. Thus, considering a multiple sequence alignment of a protein family divided into orthologous groups, one can select positions where the distribution of amino acids correlates with this division. Unlike previously published techniques, the introduced method directly takes into account nonuniformity of amino acid substitution frequencies. In addition, it does not require setting arbitrary thresholds. Instead, a formal procedure for threshold selection using the Bernoulli estimator is implemented. We tested the SDP prediction method on the LacI family of bacterial transcription factors and a sample of bacterial water and glycerol transporters belonging to the major intrinsic protein (MIP) family. In both cases, the comparison with available experimental and structural data strongly supported our predictions.

[1]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[2]  B. Wilcken-Bergmann,et al.  Mutant lac repressors with new specificities hint at rules for protein‐‐DNA recognition. , 1990, The EMBO journal.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  B. Müller-Hill,et al.  The roles of residues 5 and 9 of the recognition helix of Lac repressor in lac operator binding. , 1991, Journal of molecular biology.

[5]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[7]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[8]  C. Sander,et al.  A method to predict functional residues in proteins , 1995, Nature Structural Biology.

[9]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[10]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.

[11]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[12]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[13]  F E Cohen,et al.  Identification of functional surfaces of the zinc binding domains of intracellular receptors. , 1997, Journal of molecular biology.

[14]  P C Babbitt,et al.  Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  T. Bobik,et al.  Biochemistry of coenzyme B12-dependent glycerol and diol dehydratases and organization of the encoding genes. , 1998, FEMS microbiology reviews.

[16]  S. Deschamps,et al.  Switch from an Aquaporin to a Glycerol Channel by Two Amino Acids Substitution* , 1999, The Journal of Biological Chemistry.

[17]  A. Koehler,et al.  The role of lysine 55 in determining the specificity of the purine repressor for its operators through minor groove interactions. , 1999, Journal of molecular biology.

[18]  G. Church,et al.  Predicting ligand-binding function in families of bacterial receptors. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Russell,et al.  Analysis and prediction of functional sub-types from protein sequence alignments. , 2000, Journal of molecular biology.

[20]  A A Mironov,et al.  [Software for analyzing bacterial genomes]. , 2000, Molekuliarnaia biologiia.

[21]  Andreas Engel,et al.  Structural determinants of water permeation through aquaporin-1 , 2000, Nature.

[22]  D. Fu,et al.  Structure of a glycerol-conducting channel and the basis for its selectivity. , 2000, Science.

[23]  Bong-Gyoon Han,et al.  Structural basis of water-specific transport through the AQP1 water channel , 2001, Nature.

[24]  S. Deschamps,et al.  Functional characterization of a microbial aquaglyceroporin. , 2001, Microbiology.

[25]  Rafael Zardoya,et al.  A Phylogenetic Framework for the Aquaporin Family in Eukaryotes , 2001, Journal of Molecular Evolution.

[26]  Xun Gu,et al.  Predicting functional divergence in protein evolution by site-specific rate shifts. , 2002, Trends in biochemical sciences.

[27]  L. Mirny,et al.  Using orthologous and paralogous proteins to identify specificity determining residues , 2002, Genome Biology.

[28]  R. Brennan,et al.  Role of residue 147 in the gene regulatory function of the Escherichia coli purine repressor. , 2002, Biochemistry.

[29]  M. Gelfand,et al.  BATMAS30: Amino acid substitution matrix for alignment of bacterial transporters , 2003, Proteins.

[30]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..