Identification of bacillus species using support vector machine and codon pair relative frequency

In this paper, we proposed new approach to identify Bacillus species by using a new feature -- codon pair relative frequency -- and support vector machine (SVM). Our problem is how to use the information from some genes of specie to identify what kind of the specie is. This problem can be applied to not only research the evolutionary process but also predict the kind of specie for damaged samples. First gene database of sixteen Bacillus species is collected from National Center for Biotechnology Information (NCBI) website. Then, we extract codon pair relative frequency feature of each gene for each species. Finally, SVM "one-against-rest" method is applied to train these feature vectors. By using the proposed method we gained good results in identification for our database.