Evaluation of Standard Classifiers for Protein Subcellular Localization

Finding subcellular location of protein is a well-known problem in the field of biological science. Finding out the location of protein inside the cell helps in understanding cell function, drug development, etc. This paper presents a comparison between three standard classifiers: Classification And Regression Tree (CART), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) that are used to predict protein location. The Hold-out method is used for testing the classification models. The experiment is carried out on Yeast Dataset. The results are evaluated under four metrics: Accuracy, Macro-average Precision, Macro-average Recall and Macro-F1 Score. According to results, SVM shows better Accuracy and Macro-average Precision than other two, whereas CART shows better Macro-average Recall and Macro-F1 Score.

[1]  Michael T. Hallett,et al.  Refining Protein Subcellular Localization , 2005, PLoS Comput. Biol..

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Sang-Mun Chi,et al.  Prediction of protein subcellular localization by weighted gene ontology terms. , 2010, Biochemical and biophysical research communications.

[4]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[5]  Pierre Dönnes,et al.  Predicting Protein Subcellular Localization: Past, Present, and Future , 2004, Genomics, proteomics & bioinformatics.

[6]  M. Kanehisa,et al.  A knowledge base for predicting protein localization sites in eukaryotic cells , 1992, Genomics.

[7]  Wolfgang Huber,et al.  From ORFeome to biology: a functional genomics pipeline. , 2004, Genome research.

[8]  Ying Ju,et al.  Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier , 2016, Scientific Reports.

[9]  Ujjwal Maulik,et al.  Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou’s PseAAC , 2014, Medical & Biological Engineering & Computing.

[10]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[11]  Yi Zhang,et al.  A high-precision hybrid algorithm for predicting eukaryotic protein subcellular localization , 2019, bioRxiv.