An instance-oriented performance measure for classification

Abstract Performance evaluation is significant in data classification. The existing evaluation methods ignore the characteristics (such as classification difficulty) of each instance. In practice, it is necessary to measure classification performance from the perspective of instances. In this paper, an instance-oriented classification performance metric is proposed based on the classification difficulty of each instance, named degree of credibility (Cr ). Cr conforms to the natural cognition that the lower the probability of misclassifying relatively easy instances, the more credible the classifier. It focuses on the credibility of each instance’s prediction, which opens up a new way for classifier evaluation. Moreover, several important properties of Cr are identified, laying solid theoretical foundation for classifier evaluation. Also, the concept of acceptable classifier is proposed to judge whether the trained model and its parameter set reach excellent ranks at the current technology level instead of relying entirely on human experience. The experimental results of twelve classifiers on twelve datasets indicate the physical significance and good statistical consistency and discriminatory ability of Cr, as well as the feasibility of acceptable classifiers for model selection and training. Furthermore, the proposal of approximate difficulty greatly improves the computation efficiency of instance difficulty.

[1]  Charles X. Ling,et al.  Constructing New and Better Evaluation Measures for Machine Learning , 2007, IJCAI.

[2]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[3]  K. Pearson NOTES ON THE HISTORY OF CORRELATION , 1920 .

[4]  Adam P. Piotrowski,et al.  Swarm Intelligence and Evolutionary Algorithms: Performance versus speed , 2017, Inf. Sci..

[5]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[6]  Martin Shepperd,et al.  Assessing software defection prediction performance: why using the Matthews correlation coefficient matters , 2020, EASE.

[7]  B. Pham,et al.  A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. , 2018, The Science of the total environment.

[8]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  Shiping Chen,et al.  BIDI: A classification algorithm with instance difficulty invariance , 2021, Expert Syst. Appl..

[11]  Philippe Fournier-Viger,et al.  Fast and effective cluster-based information retrieval using frequent closed itemsets , 2018, Inf. Sci..

[12]  Adolfo Martínez Usó,et al.  Making Sense of Item Response Theory in Machine Learning , 2016, ECAI.

[13]  Amol P. Bhopale,et al.  Swarm optimized cluster based framework for information retrieval , 2020, Expert Syst. Appl..

[14]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[15]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[16]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[17]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[18]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[19]  Giorgio Fumera,et al.  F-measure curves: A tool to visualize classifier performance under imbalance , 2020, Pattern Recognit..

[20]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[21]  Mihaela Oprea,et al.  A general framework and guidelines for benchmarking computational intelligence algorithms applied to forecasting problems derived from an application domain-oriented survey , 2020, Appl. Soft Comput..

[22]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[23]  Quanhua Zhao,et al.  Hyperspectral remote sensing image classification based on tighter random projection with minimal intra-class variance algorithm , 2021, Pattern Recognit..

[24]  Juan Wang,et al.  Relationships of Cohen's Kappa, Sensitivity, and Specificity for Unbiased Annotations , 2019, ICBIP.

[25]  Beth Sundheim,et al.  MUC-5 Evaluation Metrics , 1993, MUC.

[26]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[27]  Yuncong Feng,et al.  A classification performance measure considering the degree of classification difficulty , 2016, Neurocomputing.

[28]  Davide Ballabio,et al.  Multivariate comparison of classification performance measures , 2017 .

[29]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Arvinder Kaur,et al.  An empirical evaluation of classification algorithms for fault prediction in open source projects , 2018, J. King Saud Univ. Comput. Inf. Sci..

[31]  Bernd Bischl,et al.  Analyzing the BBOB Results by Means of Benchmarking Concepts , 2015, Evolutionary Computation.

[32]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[33]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[34]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[35]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[36]  Huan Liu,et al.  A new approach to bot detection: Striking the balance between precision and recall , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[37]  Xiapu Luo,et al.  A comprehensive comparative study of clustering-based unsupervised defect prediction models , 2021, J. Syst. Softw..

[38]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[39]  Michael Reed Smith,et al.  An Empirical Study of Instance Hardness , 2009 .

[40]  Tony R. Martinez,et al.  An instance level analysis of data complexity , 2014, Machine Learning.

[41]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[43]  Meiping Song,et al.  3-D Receiver Operating Characteristic Analysis for Hyperspectral Image Classification , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[44]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .