论文信息 - An Optimization Approach of Deriving Bounds between Entropy and Error from Joint Distribution: Case Study for Binary Classifications

An Optimization Approach of Deriving Bounds between Entropy and Error from Joint Distribution: Case Study for Binary Classifications

In this work, we propose a new approach of deriving the bounds between entropy and error from a joint distribution through an optimization means. The specific case study is given on binary classifications. Two basic types of classification errors are investigated, namely, the Bayesian and non-Bayesian errors. The consideration of non-Bayesian errors is due to the facts that most classifiers result in non-Bayesian solutions. For both types of errors, we derive the closed-form relations between each bound and error components. When Fano’s lower bound in a diagram of “Error Probability vs. Conditional Entropy” is realized based on the approach, its interpretations are enlarged by including non-Bayesian errors and the two situations along with independent properties of the variables. A new upper bound for the Bayesian error is derived with respect to the minimum prior probability, which is generally tighter than Kovalevskij’s upper bound.

Bao-Gang Hu | Hong-Jie Xing | Hong-Jie Xing | Bao-Gang Hu

[1] Y. Wang,et al. Evaluation Criteria Based on Mutual Information for Classifications Including Rejected Class , 2008 .

[2] Robert Mario Fano. Fano inequality , 2008, Scholarpedia.

[3] Igor Vajda,et al. Generalized information criteria for Bayes decisions , 2012, Kybernetika.

[4] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[5] Bao-Gang Hu,et al. What Are the Differences Between Bayesian Classifiers and Mutual-Information Classifiers? , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[6] Andrew K. C. Wong,et al. Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[7] R. Rosenfeld. Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[8] Martin E. Hellman,et al. Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[9] Jana Zvárová,et al. On generalized entropies, Bayesian decisions and statistical diversity , 2007, Kybernetika.

[10] Michael I. Jordan. On statistics, computation and scalability , 2013, ArXiv.

[11] Neri Merhav,et al. Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[12] C. H. CHEN,et al. Theoretical Comparison of a Class of Feature Selection Criteria in Pattern Recognition , 1971, IEEE Transactions on Computers.

[13] Jovan Dj. Golic. Comment on 'Relations Between Entropy and Error Probability' , 1999, IEEE Trans. Inf. Theory.

[14] John W. Fisher,et al. Estimation of Signal Information Content for Classification , 2009, 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop.

[15] Jose C. Principe,et al. Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[16] Venkat R. Subramanian,et al. Symbolic solutions for boundary value problems using Maple , 2000 .

[17] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[18] Neri Merhav,et al. Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.

[19] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20] Inder Jeet Taneja,et al. Generalized error bounds in pattern recognition , 1985, Pattern Recognit. Lett..

[21] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[22] H. Vincent Poor,et al. A lower bound on the probability of error in multihypothesis testing , 1995, IEEE Trans. Inf. Theory.

[23] Ali R. Ansari,et al. A semi-analytical iterative technique for solving nonlinear problems , 2011, Comput. Math. Appl..

[24] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[25] J. T. Chu,et al. Inequalities between information measures and error probability , 1966 .

[26] Bao-Gang Hu,et al. Information Theory and its Relation to Machine Learning , 2015, ArXiv.

[27] M. Ben-Bassat,et al. Renyi's entropy and the probability of error , 1978, IEEE Trans. Inf. Theory.

[28] Peter Harremoës,et al. Inequalities between entropy and index of coincidence derived from information diagrams , 2001, IEEE Trans. Inf. Theory.

[29] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[30] Sergio Verdú,et al. Generalizing the Fano inequality , 1994, IEEE Trans. Inf. Theory.

[31] Sergio Verdú,et al. On the Interplay Between Conditional Entropy and Error Probability , 2010, IEEE Transactions on Information Theory.

[32] Jovan Dj. Golic,et al. On the relationship between the information measures and the Bayes probability of error , 1987, IEEE Trans. Inf. Theory.

[33] Samuel J. Dwyer,et al. Uncertainty and the probability of error (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[35] Yong Wang,et al. Derivations of Normalized Mutual Information in Binary Classifications , 2007, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[36] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[37] Deniz Erdogmus,et al. Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information , 2004, J. VLSI Signal Process..

[38] Ran He,et al. Information-Theoretic Measures for Objective Evaluation of Classifications , 2011, ArXiv.

[39] Chungyong Lee,et al. An information-theoretic perspective on feature selection in speaker recognition , 2005, IEEE Signal Processing Letters.

[40] I. Vajda,et al. Generalized information criteria for optimal Bayes decisions 1 Research Report No . 2239 December 2008 Generalized information criteria for optimal Bayes decisions , 2009 .

[41] Sergio Verdú,et al. Fifty Years of Shannon Theory , 1998, IEEE Trans. Inf. Theory.

[42] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[43] Raymond W. Yeung,et al. A First Course in Information Theory , 2002 .