The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
暂无分享,去创建一个
[1] L. R. Dice. Measures of the Amount of Ecologic Association Between Species , 1945 .
[2] T. Sørensen,et al. A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .
[3] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[4] C. J. van Rijsbergen,et al. FOUNDATION OF EVALUATION , 1974 .
[5] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.
[6] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[7] Jean M. Tague,et al. The pragmatics of information retrieval experimentation , 1981 .
[8] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.
[9] M. Appelbaum,et al. Psychometric methods. , 1989, Annual review of psychology.
[10] David D. Lewis,et al. Evaluating Text Categorization I , 1991, HLT.
[11] Nancy Chinchor,et al. MUC-4 evaluation metrics , 1992, MUC.
[12] Jean Tague-Sutcliffe,et al. The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..
[13] Benoit M. Dawant,et al. Morphometric analysis of white matter lesions in MR images: method and validation , 1994, IEEE Trans. Medical Imaging.
[14] Fredric C. Gey,et al. The relationship between recall and precision , 1994 .
[15] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..
[16] Marti A. Hearst. Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..
[17] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.
[18] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.
[19] Pierre Baldi,et al. Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..
[20] J. Friedman. Stochastic gradient boosting , 2002 .
[21] Hilde van der Togt,et al. Publisher's Note , 2003, J. Netw. Comput. Appl..
[22] José Salvador Sánchez,et al. Strategies for learning in class imbalance problems , 2003, Pattern Recognit..
[23] Peter A. Flach. The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.
[24] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[25] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[26] Jan Gorodkin,et al. Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..
[27] Roman Timofeev,et al. Classification and Regression Trees(CART)Theory and Applications , 2004 .
[28] Charles X. Ling,et al. Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.
[29] N. F. F. Ebecken,et al. On extending F-measure and G-mean metrics to multi-class problems , 2005, Data Mining VI.
[30] George Hripcsak,et al. Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..
[31] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[32] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..
[33] Stan Szpakowicz,et al. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.
[34] Wei Xie,et al. Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[35] Santosh K. Mishra,et al. De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..
[36] Anne-Laure Boulesteix,et al. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..
[37] S. García,et al. An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .
[38] J. Xuan,et al. Classification algorithms for phenotype prediction in genomics and proteomics. , 2008, Frontiers in bioscience : a journal and virtual library.
[39] R. Real,et al. AUC: a misleading measure of the performance of predictive distribution models , 2008 .
[40] Arie Ben-David,et al. Comparison of classification accuracy using Cohen's Weighted Kappa , 2008, Expert Syst. Appl..
[41] José Hernández-Orallo,et al. An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..
[42] Leif E. Peterson. K-nearest neighbor , 2009, Scholarpedia.
[43] Guy Lapalme,et al. A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..
[44] James D. Malley,et al. Predictor correlation impacts machine learning algorithms: implications for genomic studies , 2009, Bioinform..
[45] David J. Hand,et al. Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.
[46] Zhihua Cai,et al. Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .
[47] David J Hand,et al. Evaluating diagnostic tests: The area under the ROC curve and the balance of errors , 2010, Statistics in medicine.
[48] Blaise Hanczar,et al. Small-sample precision of ROC-related estimates , 2010, Bioinform..
[49] Kevin C. Dorff,et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models , 2010, Nature Biotechnology.
[50] José Salvador Sánchez,et al. Theoretical Analysis of a Performance Measure for Imbalanced Data , 2010, 2010 20th International Conference on Pattern Recognition.
[51] Qinghua Hu,et al. A novel measure for evaluating classifiers , 2010, Expert Syst. Appl..
[52] Joachim M. Buhmann,et al. The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.
[53] C. Tappert,et al. A Survey of Binary Similarity and Distance Measures , 2010 .
[54] Rok Blagus,et al. Class prediction for high-dimensional class-imbalanced data , 2010, BMC Bioinformatics.
[55] Peter A. Flach,et al. A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.
[56] Jan Hauke,et al. Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .
[57] Mohak Shah,et al. Evaluating Learning Algorithms: A Classification Perspective , 2011 .
[58] Veronica Cambiazo,et al. Yeast-based assay identifies novel Shh/Gli target genes in vertebrate development , 2012, BMC Genomics.
[59] Mohak Shah,et al. Evaluating Learning Algorithms: Contents , 2011 .
[60] Charles Parker,et al. An Analysis of Performance Measures for Binary Classifiers , 2011, 2011 IEEE 11th International Conference on Data Mining.
[61] Grigorios Tsoumakas,et al. Random K-labelsets for Multilabel Classification , 2022 .
[62] Philip Sedgwick,et al. Pearson’s correlation coefficient , 2012, BMJ : British Medical Journal.
[63] N. Adams,et al. Measuring classification performance : the hmeasure package , 2012 .
[64] M. Vihinen. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.
[65] David M. W. Powers,et al. The Problem with Kappa , 2012, EACL.
[66] Cesare Furlanello,et al. A Comparison of MCC and CEN Error Measures in Multi-Class Prediction , 2010, PloS one.
[67] M. McHugh. Interrater reliability: the kappa statistic , 2012, Biochemia medica.
[68] Mohamed Bekkar,et al. Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .
[69] David P. Kreil,et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.
[70] Marco Masseroli,et al. Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation , 2014, CIBB.
[71] J. Keilwagen,et al. Area under Precision-Recall Curves for Weighted and Unweighted Data , 2014, PloS one.
[72] Charles Elkan,et al. Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.
[73] Fabien Subtil,et al. The precision--recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. , 2015, Journal of clinical epidemiology.
[74] S. Julious,et al. The disagreeable behaviour of the kappa statistic , 2015, Pharmaceutical statistics.
[75] Fabrizio Sebastiani,et al. An Axiomatically Derived Measure for the Evaluation of Classification Algorithms , 2015, ICTIR.
[76] Takaya Saito,et al. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.
[77] Various performance measures in Binary classification –An Overview of ROC study , 2015 .
[78] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[79] David M. W. Powers,et al. What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes , 2015, ArXiv.
[80] Jiujun Cheng,et al. Research on the Matthews Correlation Coefficients Metrics of Personalized Recommendation Algorithm Evaluation , 2015 .
[81] Peter A. Flach,et al. Precision-Recall-Gain Curves: PR Analysis Done Right , 2015, NIPS.
[82] Adam B. Yedidia. Against the F-score , 2016 .
[83] Sang-Tae Han,et al. Comparison of the Performance Evaluations in Classification , 2016 .
[84] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[85] Luís Torgo,et al. A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..
[86] K. Pollard,et al. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.
[87] Claudia Biermann,et al. Mathematical Methods Of Statistics , 2016 .
[88] Davide Ballabio,et al. Multivariate comparison of classification performance measures , 2017 .
[89] Davide Chicco,et al. Ten quick tips for machine learning in computational biology , 2017, BioData Mining.
[90] Pedro J. Ballester,et al. Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours , 2017, Oncotarget.
[91] Eamonn J. Keogh. Nearest Neighbor , 2010, Encyclopedia of Machine Learning.
[92] Sabri Boughorbel,et al. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.
[93] Josephine Sarpong Akosa,et al. Predictive Accuracy : A Misleading Performance Measure for Highly Imbalanced Data , 2017 .
[94] Fabio Roli,et al. Designing multi-label classifiers that maximize F measures: State of the art , 2017, Pattern Recognit..
[95] Michael A. Beer,et al. Local epigenomic state cannot discriminate interacting and non-interacting enhancer–promoter pairs with high accuracy , 2018, bioRxiv.
[96] Cesare Furlanello,et al. Phylogenetic convolutional neural networks in metagenomics , 2017, BMC Bioinformatics.
[97] Cesare Furlanello,et al. Distillation of the clinical algorithm improves prognosis by multi-task deep learning in high-risk Neuroblastoma , 2018, PloS one.
[98] Aman Dubey,et al. Evaluation of Approximate Rank-Order Clustering using Matthews Correlation Coefficient , 2018 .
[99] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.
[100] Jaime S. Cardoso,et al. Supervised deep learning embeddings for the prediction of cervical cancer diagnosis , 2018, PeerJ Comput. Sci..
[101] Peter Christen,et al. A note on using the F-measure for evaluating record linkage algorithms , 2017, Statistics and Computing.
[102] J B Brown,et al. Classifiers and their Metrics Quantified , 2018, Molecular informatics.
[103] Michael M. Hoffman,et al. Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome , 2018, Genome Biology.
[104] Amalia Luque,et al. The impact of class imbalance in classification performance metrics based on the binary confusion matrix , 2019, Pattern Recognit..
[105] Rosario Delgado,et al. Enhancing Confusion Entropy (CEN) for binary and multiclass classification , 2019, PloS one.
[106] Davide Chicco,et al. Computational prediction of diagnosis and feature selection on mesothelioma patient health records , 2019, PloS one.
[107] Xavier-Andoni Tibau,et al. Why Cohen’s Kappa should be avoided as performance measure in classification , 2019, PloS one.
[108] David M. W. Powers,et al. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.
[109] Alaa Tharwat,et al. Classification assessment methods , 2020, Applied Computing and Informatics.