Applications of four machine learning algorithms in identifying bacterial essential genes based on composition features

Essential genes play vital roles in bacterial survival and they are potential antimicrobial targets and cornerstones of synthetic biology. Accurate recognition of bacterial essential genes by computational methods becomes necessary because of high economical and time consumption in wet experiments. In this paper, we evaluated the effectiveness of four machine learning methods that are Support Vector Machine (SVM), SVM after student's t test (ttSVM), Principal Component Regression (PCR) and Kernel Principal Component Regression (KPCR), in identifying bacterial essential genes. A total of 24 bacterial genomes were involved and 544 compositional features, generated from the primary genome sequence in each genome. For convenience of the majority of experimental scientists to compare the effectiveness of the four methods, a web server has been constructed, which is freely available at http://cefg.uestc.edu.cn/ibeg.