Expert system for predicting protein localization sites in gram‐negative bacteria

We have developed an expert system that makes use of various kinds of knowledge organized as “if‐then” rules for predicting protein localization sites in Gram‐negative bacteria, given the amino acid sequence information alone. We considered four localization sites: the cytoplasm, the inner (cytoplasmic) membrane, the periplasm, and the outer membrane. Most rules were derived from experimental observations. For example, the rule to recognize an inner membrane protein is the presence of either a hydrophobic stretch in the predicted mature protein or an uncleavable N‐terminal signal sequence. Lipoproteins are first recognized by a consensus pattern and then assumed present at either the inner or outer membrane. These two possibilities are further discriminated by examining an acidic residue in the mature N‐terminal portion. Furthermore, we found an empirical rule that periplasmic and outer membrane proteins were successfully discriminated by their different amino acid composition. Overall, our system could predict 83% of the localization sites of proteins in our database.

[1]  Gunnar von Heijne,et al.  Patterns of Amino Acids near Signal‐Sequence Cleavage Sites , 1983 .

[2]  L. Taylor,et al.  The control region of the F plasmid transfer operon: DNA sequence of the traJ and traY genes and characterisation of the traY → Z promoter , 1983 .

[3]  H. Nikaido,et al.  Amino acid sequence homology among the major outer membrane proteins of Escherichia coli. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M E Watson,et al.  Compilation of published signal sequences. , 1984, Nucleic acids research.

[5]  D. Eisenberg Three-dimensional structure of membrane and surface proteins. , 1984, Annual review of biochemistry.

[6]  S. Normark,et al.  Genes determining adhesin formation in uropathogenic Escherichia coli. , 1985, Current topics in microbiology and immunology.

[7]  W. Wickner,et al.  Effects of two sec genes on protein assembly into the plasma membrane of Escherichia coli. , 1985, The Journal of biological chemistry.

[8]  C DeLisi,et al.  The detection and classification of membrane-spanning proteins. , 1985, Biochimica et biophysica acta.

[9]  G. von Heijne,et al.  Signal sequences: The limits of variation , 1985 .

[10]  D. McGeoch,et al.  On the predictive recognition of signal peptide sequences. , 1985, Virus research.

[11]  Gunnar von Heijne,et al.  Net N-C charge imbalance may be important for signal sequence function in bacteria , 1986 .

[12]  L. Gierasch,et al.  Molecular mechanisms of protein secretion: the role of the signal sequence. , 1986, Advances in protein chemistry.

[13]  G. Heijne A new method for predicting signal sequence cleavage sites. , 1986 .

[14]  R. Kadner,et al.  Nucleotide sequence of the btuCED genes involved in vitamin B12 transport in Escherichia coli and homology with components of periplasmic-binding-protein-dependent transport systems , 1986, Journal of bacteriology.

[15]  J. Beckwith,et al.  Genetic studies on protein export in bacteria. , 1986, Current topics in microbiology and immunology.

[16]  H. D. Peck,et al.  Putative signal peptide on the small subunit of the periplasmic hydrogenase from Desulfovibrio vulgaris , 1986, Journal of bacteriology.

[17]  Donald A. Waterman,et al.  A Guide to Expert Systems , 1986 .

[18]  F. Jähnig,et al.  Models for the structure of outer-membrane proteins of Escherichia coli derived from raman spectroscopy and prediction methods. , 1986, Journal of molecular biology.

[19]  N. Mackman,et al.  Genetics and biochemistry of the assembly of proteins into the outer membrane of E. coli. , 1987, Progress in biophysics and molecular biology.

[20]  T. L. Blundell,et al.  Knowledge-based prediction of protein structures and the design of novel molecules , 1987, Nature.

[21]  S. Wold,et al.  Signal peptide amino acid sequences in Escherichia coli contain information related to final protein localization. A multivariate data analysis. , 1987, The EMBO journal.

[22]  C. d’Enfert,et al.  Cloning and expression in Escherichia coli of the Klebsiella pneumoniae genes for production, surface localization and secretion of the lipoprotein pullulanase. , 1987, The EMBO journal.

[23]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[24]  T. Hirst,et al.  Mechanisms for secretion of extracellular proteins by gram-negative bacteria. , 1988, TIBS -Trends in Biochemical Sciences. Regular ed.

[25]  D. Rekosh,et al.  Cloning and sequencing of Haemophilus influenzae outer membrane protein P6 , 1988, Infection and immunity.

[26]  M. Inouye,et al.  A single amino acid determinant of the membrane localization of lipoproteins in E. coli , 1988, Cell.

[27]  M. Kanehisa,et al.  Prediction of in-vivo modification sites of proteins from their primary structures. , 1988, Journal of biochemistry.

[28]  M Kanehisa A multivariate analysis method for discriminating protein secondary structural segments. , 1988, Protein engineering.

[29]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[30]  H. Hara,et al.  Lipid modification of Escherichia coli penicillin-binding protein 3 , 1988, Journal of bacteriology.

[31]  F. Jähnig,et al.  Restoration of membrane incorporation of an Escherichia coli outer membrane protein (OmpA) defective in membrane insertion. , 1989, The Journal of biological chemistry.

[32]  G. Schulz,et al.  The structure of porin from Rhodobacter capsulatus at 0.6 nm resolution , 1989 .

[33]  R. Macnab,et al.  Export of an N-terminal fragment of Escherichia coli flagellin by a flagellum-specific pathway. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Gunnar von Heijne,et al.  The structure of signal peptides from bacterial lipoproteins. , 1989 .

[35]  H. Nikaido,et al.  In vitro trimerization of OmpF porin secreted by spheroplasts of Escherichia coli. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Winona C. Barker,et al.  Protein sequence database. , 1990 .

[37]  W A Gilbert,et al.  The prediction of transmembrane protein sequences and their conformation: an evaluation. , 1990, Trends in biochemical sciences.