Transmembrane topology prediction methods: A re-assessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topology

We selected 10 transmembrane (TM) prediction methods (KKD, TMpred, TopPred II, DAS, TMAP, MEMSAT 2, SOSUI, PRED-TMR2, TMHMM 2.0 and HMMTOP 2.0) and re-assessed its prediction performance using a reliable dataset with 122 entries of experimentally-characterized TM topologies. Then, we improved prediction performance by a consensus prediction method. Prediction performance during re-assessment and consensus prediction were based on four attributes: (i) the number of transmembrane segments (TMSs), (ii) the number of TMSs plus TMS-position, (iii) N-tail location and (iv) TM topology. We noted that hidden Markov model-based methods dominate over other methods by individual prediction performance for all four attributes. In addition, all top-performing methods generally were model-based. Among prokaryotic sequences, HMMTOP 2.0 solely topped among other methods with prediction accuracies ranging from 64% to 86% across all attributes. However, among eukaryotic sequences, prediction performance for all the attributes was relatively poor compared with prokaryotic ones. On the other hand, our results showed that our proposed consensus prediction method significantly improved prediction performance by, at least, an additional nine percentage points particularly among prokaryotic sequences for the number of TMS (84%), number of TMS and position (80%), and TM topology attributes (74%). Although our consensus prediction method improved also the prediction performance among eukaryotic sequences, the obtained accuracies for all attributes were relatively lower than that obtained by prokaryotic counterparts particularly for TM topology.