论文信息 - Deconstructing Cross-Entropy for Probabilistic Binary Classifiers - 字舞流文

Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i) prior knowledge; and (ii) the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE) plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.

Daniel Ramos | Javier Franco-Pedroso | Joaquín González-Rodríguez | Alicia Lozano-Diez | J. González-Rodríguez | D. Ramos | Alicia Lozano-Diez | J. Franco-Pedroso

[1] Moisés Goldszmidt,et al. Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[2] David A. van Leeuwen,et al. An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[3] Joaquin Gonzalez-Rodriguez,et al. Reliable support: Measuring calibration of likelihood ratios. , 2013, Forensic science international.

[4] David A. van Leeuwen,et al. Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Nir Friedman,et al. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[6] A. H. Murphy,et al. Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[7] A. Raftery,et al. Probabilistic forecasts, calibration and sharpness , 2007 .

[8] Alvin F. Martin,et al. NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels , 2009, INTERSPEECH.

[9] Richard Lippmann,et al. Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[10] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[11] G. Gigerenzer,et al. Probabilistic mental models: a Brunswikian theory of confidence. , 1991, Psychological review.

[12] Daniel Ramos,et al. Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data , 2016, PloS one.

[13] Tom Fawcett,et al. PAV and the ROC convex hull , 2007, Machine Learning.

[14] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[15] Jan De Kinder,et al. Expressing evaluative opinions: a position statement. , 2011, Science & justice : journal of the Forensic Science Society.

[16] Rich Caruana,et al. Predicting good probabilities with supervised learning , 2005, ICML.

[17] Niko Brümmer,et al. The PAV algorithm optimizes binary proper scoring rules , 2013, ArXiv.

[18] Alvin F. Martin,et al. The NIST 2010 speaker recognition evaluation , 2010, INTERSPEECH.

[19] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20] Niko Brümmer,et al. Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[21] C. Aitken,et al. Expressing evaluative opinions: a position statement , 2011 .

[22] Qinghua Hu,et al. A novel measure for evaluating classifiers , 2010, Expert Syst. Appl..

[23] Bianca Zadrozny,et al. Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[24] H. Lehmann,et al. Clinical Decision Support Systems (cdsss) Have Been Hailed for Their Potential to Reduce Medical Errors Clinical Decision Support Systems for the Practice of Evidence-based Medicine , 2022 .

[25] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[26] G. Brier. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[27] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[28] Niko Brümmer,et al. Measuring, refining and calibrating speaker and language information extracted from speech , 2010 .

[29] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[30] Julian Fiérrez,et al. From Biometric Scores to Forensic Likelihood Ratios , 2017, Handbook of Biometrics for Forensic Science.

[31] W. Thompson,et al. Lay understanding of forensic statistics: Evaluation of random match probabilities, likelihood ratios, and verbal equivalents. , 2015, Law and human behavior.

[32] Daniel Ramos,et al. The use of LA-ICP-MS databases to calculate likelihood ratios for the forensic analysis of glass evidence. , 2018, Talanta.

[33] Grzegorz Zadora,et al. Information‐Theoretical Assessment of the Performance of Likelihood Ratio Computation Methods , 2013, Journal of forensic sciences.

[34] A. Dawid. The Well-Calibrated Bayesian , 1982 .

[35] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[36] Geoffrey Stewart Morrison,et al. Tutorial on logistic-regression calibration and fusion:converting a score to a likelihood ratio , 2013, 2104.08846.

[37] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[38] Doroteo Torre Toledano,et al. Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[39] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[40] L. J. Savage. Elicitation of Personal Probabilities and Expectations , 1971 .

[41] Colin Aitken,et al. Evaluation of trace evidence in the form of multivariate data , 2004 .

[42] Stephen E. Fienberg,et al. The Comparison and Evaluation of Forecasters. , 1983 .

[43] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[44] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[45] Cesare Furlanello,et al. A Comparison of MCC and CEN Error Measures in Multi-Class Prediction , 2010, PloS one.