Understanding thePrediction of Transmembrane Proteins bySupport Vector Machine using Association RuleMining

With theefforts tounderstand protein structure, many computational approaches havebeenmade recently. Amongthem,thesupport vector machine (SVM)methods have beenrecently applied and showedsuccessful performance comparedwithothermachinelearning schemes.However, despite thehighperformance, theSVM approaches suffer from theproblem ofunderstandability since itisablack-box model. Toovercome this limitation, this study attempted tocombine the SVM withtheassociation rulebasedclassifier whichcanpresent themeaningful explanation abouttheprediction. Toperform this task, anewassociation rulebasedclassifier (PCPAR)wasdevised basedontheexisting classifier, CPAR,tohandle thesequential data. PCPARcreates thepatterns bymerging thegenerated rules andthenclassifies thesequential databased onthepattern match. Theexperimental result presents thefollowing: withsequential data, thePCPARschemeshowsbetter performance withrespect totheaccuracy andthenumberofgenerated patterns thanCPAR methodwhetherapplied aloneorcombined withSVM. The combined schemeofSVM_PCPAR generates morecompact patterns thanthecombined schemeofSVM withdecision tree, SVM_DT,withsimilar performance. Thesepatterns areeasily understandable andbiologically meaningful. IndexTerms-support vector machine, association rulebased classifier, decision tree, CPAR,PCPAR