Combining diverse classifiers using precision index functions

This paper introduces a combining classifier using two proposed precision indexes: precision index (PIN) and class specific precision index (PIC). Comparison of combining methods typically fails to consider high precision performance. This new combining method generates predictions with higher precision and recall than other methods. The proposed method is especially useful for efficient screening of predictions where actual verification is time consuming and costly. The performance of the proposed method is compared to majority voting, stacking, and cluster-selection for two well-known datasets: 1vowel recognition (Hastie et al., 2009) 2yeast protein localisation (Frank and Asuncion, 2010). The precisions obtained exceeded results previously reported for protein localisation data (Horton and Nakai, 1997; Chen, 2010) and for vowel recognition data (Hastie et al., 2009). A weighted precision index using PIC and PIN indexes outperformed all combining methods at higher precisions.

[1]  Ludmila I. Kuncheva,et al.  Clustering-and-selection model for classifier combination , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[2]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[3]  Daijin Ko,et al.  Enriching for correct prediction of biological processes using a combination of diverse classifiers , 2011, BMC Bioinformatics.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Anders Krogh,et al.  Learning with ensembles: How overfitting can be useful , 1995, NIPS.

[6]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Paul Horton,et al.  Better Prediction of Protein Cellular Localization Sites with the it k Nearest Neighbors Classifier , 1997, ISMB.

[8]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[9]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[10]  Yetian Chen Predicting the Cellular Localization Sites of Proteins Using Bayesian Networks and Bayesian Model Averaging , 2010 .

[11]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[12]  Anthony J. Bonner,et al.  Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements , 2007, BMC Bioinformatics.

[13]  Tony R. Martinez,et al.  Turning Bayesian model averaging into Bayesian model combination , 2011, The 2011 International Joint Conference on Neural Networks.

[14]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[17]  Ching Y. Suen,et al.  A theoretical analysis of the application of majority voting to pattern recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[18]  Yetian Chen Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks , 2008 .

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[23]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[24]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[28]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[29]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.