An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins

MOTIVATION All-alpha membrane proteins constitute a functionally relevant subset of the whole proteome. Their content ranges from about 10 to 30% of the cell proteins, based on sequence comparison and specific predictive methods. Due to the paucity of membrane proteins solved with atomic resolution, the training/testing sets of predictive methods for protein topography and topology routinely include very few well-solved structures mixed with a hundred proteins known with low resolution. Moreover, available predictors fail in predicting recently crystallised membrane proteins (Chen et al., 2002). Presently the number of well-solved membrane proteins comprises some 59 chains of low sequence homology. It is therefore possible to train/test predictors only with the set of proteins known with atomic resolution and evaluate more thoroughly the performance of different methods. RESULTS We implement a cascade-neural network (NN), two different hidden Markov models (HMM), and their ensemble (ENSEMBLE) as a new method. We train and test in cross validation the three methods and ENSEMBLE on the 59 well resolved membrane proteins. ENSEMBLE scores with a per-protein accuracy of 90% for topography and 71% for topology, outperforming the best single method of 7 and 5 percentage points, respectively. When tested on a low resolution set of 151 proteins, with no homology with the 59 proteins, the per-protein accuracy of ENSEMBLE is 76% for topography and 68% for topology. Our results also indicate that the performance of ENSEMBLE is higher than that of the best predictors presently available on the Web.

[1]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[2]  Rolf Apweiler,et al.  A collection of well characterised integral membrane proteins , 2000, Bioinform..

[3]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[4]  Rolf Apweiler,et al.  Evaluation of methods for the prediction of membrane spanning regions , 2001, Bioinform..

[5]  W. Wimley Toward genomic identification of β‐barrel membrane proteins: Composition and architecture of known structures , 2002, Protein science : a publication of the Protein Society.

[6]  Burkhard Rost,et al.  Long membrane helices and short loops predicted less accurately , 2002, Protein science : a publication of the Protein Society.

[7]  Piero Fariselli,et al.  A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins , 2002, ISMB.

[8]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[9]  A. Kernytsky,et al.  Transmembrane helix predictions revisited , 2002, Protein science : a publication of the Protein Society.

[10]  Piero Fariselli,et al.  MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments , 2003, Bioinform..

[11]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  B. Rost,et al.  Topology prediction for helical transmembrane proteins at 86% accuracy–Topology prediction at 86% accuracy , 1996, Protein science : a publication of the Protein Society.

[14]  W R Taylor,et al.  A model recognition approach to the prediction of all-helical membrane protein structure and topology. , 1994, Biochemistry.

[15]  G. Schulz β-Barrel membrane proteins , 2000 .

[16]  S. White,et al.  Membrane protein folding and stability: physical principles. , 1999, Annual review of biophysics and biomolecular structure.

[17]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[18]  R. Casadio,et al.  Prediction of the transmembrane regions of β‐barrel membrane proteins with a neural network‐based predictor , 2001, Protein science : a publication of the Protein Society.

[19]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[20]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[21]  S H White,et al.  MPtopo: A database of membrane protein topology , 2001, Protein science : a publication of the Protein Society.

[22]  Gunnar von Heijne,et al.  Recent advances in the understanding of membrane protein assembly and structure , 1999, Quarterly Reviews of Biophysics.