论文信息 - Derivations of Normalized Mutual Information in Binary Classifications

Derivations of Normalized Mutual Information in Binary Classifications

Although the conventional performance indexes, such as accuracy, are commonly used in classifier selection or evaluation, information-based criteria, such as mutual information, are becoming popular in feature/model selections. In this work, we analyze the classifier learning model with the maximization normalized mutual information (NI) criterion, which is novel and well defined in a compact range for classifier evaluation. We derive close-form relations of normalized mutual information with respect to accuracy, precision, and recall in binary classifications. By exploring the relations among them, we reveal that NI is actually a set of nonlinear functions, with a concordant power-exponent form, to each performance index. The relations can also be expressed with respect to precision and recall, or to false alarm and hitting rate (recall).

Yong Wang | Bao-Gang Hu | Y. Wang | Bao-Gang Hu

[1] Walter Zucchini,et al. Model Selection , 2011, International Encyclopedia of Statistical Science.

[2] Guillaume Bouchard,et al. Selection of generative models in classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4] Joe Suzuki,et al. On Strong Consistency of Model Selection in Classification , 2006, IEEE Transactions on Information Theory.

[5] H. Akaike. A new look at the statistical model identification , 1974 .

[6] John W. Fisher,et al. Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[7] Robert B. Ash,et al. Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[8] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[9] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[10] Niilo Saranummi,et al. Information technology in biomedicine , 2002, IEEE Trans. Biomed. Eng..

[11] J. Shore. On a relation between maximum likelihood classification and minimum relative-entropy classification , 1984, IEEE Trans. Inf. Theory.

[12] D. V. Sridhar,et al. Information theoretic subset selection for neural network models , 1998 .

[13] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[14] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[15] L. Goddard. Information Theory , 1962, Nature.

[16] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[17] Wang Yong. A Study on Integrated Evaluating Kernel Classification Performance Using Statistical Methods , 2008 .

[18] David G. Stork,et al. Pattern Classification , 1973 .

[19] Ian Witten,et al. Data Mining , 2000 .

[20] W. W. Peterson,et al. The theory of signal detectability , 1954, Trans. IRE Prof. Group Inf. Theory.

[21] R. F. Wagner,et al. Assessment of medical imaging systems and computer aids: a tutorial review. , 2007, Academic radiology.

[22] Rodney W. Johnson,et al. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[23] Murray H. Loew,et al. Assessing Classifiers from Two Independent Data Sets Using ROC Analysis: A Nonparametric Approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[25] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[26] William H. Press,et al. Numerical recipes in C , 2002 .

[27] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[28] Paul Lukowicz,et al. Activity Recognition of Assembly Tasks Using Body-Worn Microphones and Accelerometers , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[31] Zucchini,et al. An Introduction to Model Selection. , 2000, Journal of mathematical psychology.

[32] Julian R. Ullmann,et al. Discrete Optimization by Relational Constraint Satisfaction , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[34] Shun-ichi Amari,et al. Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[35] J. Príncipe,et al. Energy, entropy and information potential for neural computation , 1998 .

[36] Hayit Greenspan,et al. Medical Image Categorization and Retrieval for PACS Using the GMM-KL Framework , 2007, IEEE Transactions on Information Technology in Biomedicine.

[37] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[38] Nicu Sebe,et al. How to complete performance graphs in content-based image retrieval: add generality and normalize scope , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[40] Martin E. Hellman,et al. Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.