Gibbs motif sampling: Detection of bacterial outer membrane protein repeats

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif‐encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix‐turn‐helix DNA‐binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403–410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric β‐barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane‐spanning β‐strands. These β‐strands occur on the membrane interface (as opposed to the trimeric interface) of the β‐barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  Robert L. Campbell,et al.  ESCHERICHIA COLI K-12* , 1973 .

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[6]  R. Morona,et al.  Escherichia coli K-12 outer membrane protein (OmpA) as a bacteriophage receptor: analysis of mutant genes expressing altered proteins , 1984, Journal of bacteriology.

[7]  The growing immunoglobulin gene superfamily. , 1986, Nature.

[8]  F. Jähnig,et al.  Models for the structure of outer-membrane proteins of Escherichia coli derived from raman spectroscopy and prediction methods. , 1986, Journal of molecular biology.

[9]  L. Hood,et al.  Immunology: The growing immunoglobulin gene superfamily , 1986, Nature.

[10]  T. Meyer,et al.  Gene structure and extracellular secretion of Neisseria gonorrhoeae IgA protease , 1987, Nature.

[11]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. F. Williams,et al.  The immunoglobulin superfamily--domains for cell surface recognition. , 1988, Annual review of immunology.

[13]  G. Lemke,et al.  Isolation and analysis of the gene encoding peripheral myelin protein zero , 1988, Neuron.

[14]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[15]  B. Matthews,et al.  The helix-turn-helix DNA binding motif. , 1989, The Journal of biological chemistry.

[16]  Rodger Staden,et al.  Methods for calculating the probabilities of finding patterns in sequences , 1989, Comput. Appl. Biosci..

[17]  G. Schulz,et al.  The three‐dimensional structure of porin from Rhodobacter capsulatus at 3 Å resolution , 1990, FEBS letters.

[18]  M. Gribskov,et al.  [9] Profile analysis , 1990 .

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  L. Hutt-Fletcher,et al.  Characterization and expression of a glycoprotein encoded by the Epstein-Barr virus BamHI I fragment , 1990, Journal of virology.

[21]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[22]  S. Henikoff,et al.  rbcR [correction of rcbR], a gene coding for a member of the LysR family of transcriptional regulators, is located upstream of the expressed set of ribulose 1,5-bisphosphate carboxylase/oxygenase genes in the photosynthetic bacterium Chromatium vinosum , 1991, Journal of bacteriology.

[23]  S. Gottesman,et al.  RcsA, an unstable positive regulator of capsular polysaccharide synthesis , 1991, Journal of bacteriology.

[24]  K. Gehring,et al.  Structural architecture of an outer membrane channel as determined by electron crystallography , 1991, Nature.

[25]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[26]  J. Tommassen,et al.  Carboxy-terminal phenylalanine is essential for the correct assembly of a bacterial outer membrane protein. , 1991, Journal of molecular biology.

[27]  K. Kuma,et al.  The immunoglobulin family , 1991 .

[28]  J. Lakey,et al.  The bacterial porin superfamily: sequence alignment and structure prediction , 1991, Molecular microbiology.

[29]  R. Sauer,et al.  Transcription factors: structural families and principles of DNA recognition. , 1992, Annual review of biochemistry.

[30]  H. Nikaido,et al.  Porins and specific channels of bacterial outer membranes , 1992, Molecular microbiology.

[31]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[32]  C. Desplan,et al.  The homeodomain: A new face for the helix‐turn‐helix? , 1992, BioEssays : news and reviews in molecular, cellular and developmental biology.

[33]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[34]  G. Stormo,et al.  Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. , 1992, Journal of molecular biology.

[35]  S. Adhya,et al.  A family of bacterial regulators homologous to Gal and Lac repressors. , 1992, The Journal of biological chemistry.

[36]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[37]  G. Rummel,et al.  Crystal structures explain functional properties of two E. coli porins , 1992, Nature.

[38]  A molecular basis for gating mode transitions in human skeletal muscle Na+ channels , 1993, FEBS letters.

[39]  Hans-Werner Mewes,et al.  The PIR-International databases , 1993, Nucleic Acids Res..

[40]  Jean-Michel Claverie,et al.  Information Enhancement Methods for Large Scale Sequence Analysis , 1993, Comput. Chem..

[41]  E. Jones,et al.  The immunoglobulin superfamily: Current Opinion in Structural Biology 1993, 3:846–852 , 1993 .

[42]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[43]  S. Cowan Bacterial porins: lessons from three high-resolution structures: Current Opinion in Structural Biology 1993, 3:501–507 , 1993 .

[44]  J. Ramos,et al.  The XylS/AraC family of regulators. , 1993, Nucleic acids research.

[45]  S. Cowan,et al.  Prediction of membrane‐spanning β‐strands and its application to maltoporin , 1993, Protein science : a publication of the Protein Society.

[46]  Analysis of a cloned Francisella tularensis outer membrane protein gene and expression in attenuated Salmonella typhimurium. , 1993, FEMS microbiology letters.

[47]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[48]  J. Lakey,et al.  Chapter 17 The porin superfamily: diversity and common features , 1994 .

[49]  M. Gribskov,et al.  Profile Analysis , 1970 .

[50]  G. Schulz,et al.  Structure of the membrane channel porin from Rhodopseudomonas blastica at 2.0 Å resolution , 1994, Protein science : a publication of the Protein Society.

[51]  A. Sonenshein,et al.  Identification of two distinct Bacillus subtilis citrate synthase genes , 1994, Journal of bacteriology.

[52]  P Bork,et al.  The immunoglobulin fold. Structural classification, sequence patterns and common core. , 1994, Journal of molecular biology.

[53]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[54]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[55]  P. Bucher,et al.  Improving the sensitivity of the sequence profile method , 1994, Protein science : a publication of the Protein Society.

[56]  A. F. Neuwald,et al.  Detecting patterns in protein sequences. , 1994, Journal of molecular biology.

[57]  C Chothia,et al.  Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. , 1994, Journal of molecular biology.

[58]  H. Nikaido,et al.  Porins and specific diffusion channels in bacterial outer membranes. , 1994, The Journal of biological chemistry.

[59]  S. Henikoff,et al.  Protein family classification based on searching a database of blocks. , 1994, Genomics.

[60]  Y. Stierhof,et al.  New outer membrane-associated protease of Escherichia coli K-12 , 1994, Journal of bacteriology.

[61]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[62]  C E Lawrence,et al.  Detection of likely transmembrane β-strand regions in sequences of mitochondrial pore proteins using the Gibbs sampler , 1996, Journal of bioenergetics and biomembranes.

[63]  John F. Kennedy,et al.  Bacterial cell wall , 1996 .

[64]  [Immunoglobulin superfamily]. , 1996, Rinsho byori. The Japanese journal of clinical pathology.

[65]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .