Incorporating Domain Knowledge into a Min-Max Modular Support Vector Machine for Protein Subcellular Localization

As biological sequences and various annotation data grow rapidly in public databases, the classification problems become larger and more complicated. New classifier designs are necessitated. Besides, how to incorporate some explicit domain knowledge into learning methods is also a big issue. In this paper, we adopt a modular classifier, min-max modular support vector machine (M3-SVM) to solve protein subcellular localization problem, and use the domain knowledge of taxonomy information to guide the task decomposition. Experimental results show that M3-SVM can maintain the overall accuracy and improve location average accuracy compared with traditional SVMs. The taxonomy decomposition is superior to other decomposition methods on a majority of the classes. The results also demonstrate a speedup on training time of M3-SVM compared with traditional SVMs.