A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins

BackgroundIntegral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences.ResultsThe training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set.ConclusionBased on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: http://bioinformatics.biol.uoa.gr/PRED-TMBB, and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology.

[1]  R. Casadio,et al.  Prediction of the transmembrane regions of β‐barrel membrane proteins with a neural network‐based predictor , 2001, Protein science : a publication of the Protein Society.

[2]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[3]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[4]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  E. Gouaux alpha-Hemolysin from Staphylococcus aureus: an archetype of beta-barrel, channel-forming toxins. , 1998, Journal of structural biology.

[6]  Qi Liu,et al.  A HMM-based method to predict the transmembrane regions of \beta-barrel membrane proteins , 2003, Comput. Biol. Chem..

[7]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[8]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[9]  David P. Chimento,et al.  Substrate-induced transmembrane signaling in the cobalamin transporter BtuB , 2003, Nature Structural Biology.

[10]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[11]  W. Wimley Toward genomic identification of β‐barrel membrane proteins: Composition and architecture of known structures , 2002, Protein science : a publication of the Protein Society.

[12]  G. Schulz The structure of bacterial outer membrane proteins. , 2002, Biochimica et biophysica acta.

[13]  E. Gouaux α-Hemolysin fromStaphylococcus aureus:An Archetype of β-Barrel, Channel-Forming Toxins , 1998 .

[14]  K. Diederichs,et al.  Prediction by a Neural Network of Outer Membrane P-strand Protein Topology , 1998 .

[15]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  Wing-Yiu Choy,et al.  Solution structure and dynamics of the outer membrane enzyme PagP by NMR , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Saier,et al.  The beta-barrel finder (BBF) program, allowing identification of outer membrane beta-barrel proteins encoded within prokaryotic genomes. , 2002, Protein science : a publication of the Protein Society.

[19]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[20]  M. Saier,et al.  The β‐barrel finder (BBF) program, allowing identification of outer membrane β‐barrel proteins encoded within prokaryotic genomes , 2002 .

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[24]  Gunnar von Heijne,et al.  Recent advances in the understanding of membrane protein assembly and structure , 1999, Quarterly Reviews of Biophysics.

[25]  Claude Pasquier,et al.  PRED‐CLASS: Cascading neural networks for generalized protein classification and genome‐wide applications , 2001, Proteins.

[26]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[27]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[28]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[29]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[30]  Piero Fariselli,et al.  A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins , 2002, ISMB.

[31]  Colin Hughes,et al.  Crystal structure of the bacterial membrane protein TolC central to multidrug efflux and protein export , 2000, Nature.

[32]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[33]  Tamotsu Noguchi,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 , 2003, Nucleic Acids Res..

[34]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[35]  Anders Krogh,et al.  Hidden Neural Networks , 1999, Neural Computation.

[36]  Piet Gros,et al.  Crystal Structure of Neisserial Surface Protein A (NspA), a Conserved Outer Membrane Protein with Vaccine Potential* , 2003, Journal of Biological Chemistry.

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .