Understanding Malvestuto’s Normalized Mutual Information

Malvestuto’s version of the normalized mutual information is a well-known information theoretic index for quantifying agreement between two partitions. To further our understanding of what information on agreement between the clusters the index may reflect, we study components of the index that contain information on individual clusters, using mathematical analysis and numerical examples. The indices for individual clusters provide useful information on what is going on with specific clusters.

[1]  Lawrence Hubert,et al.  The variance of the adjusted Rand index. , 2016, Psychological methods.

[2]  G. W. Milligan,et al.  CLUSTERING VALIDATION: RESULTS AND IMPLICATIONS FOR APPLIED ANALYSES , 1996 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  Ahmed Albatineh,et al.  On Similarity Indices and Correction for Chance Agreement , 2006, J. Classif..

[5]  Ahmed Albatineh,et al.  Correcting Jaccard and other similarity indices for chance agreement in cluster analysis , 2011, Adv. Data Anal. Classif..

[6]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[7]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[8]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[9]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[10]  M. Warrens On Association Coefficients for 2×2 Tables and Properties That Do Not Depend on the Marginal Distributions , 2008, Psychometrika.

[11]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[13]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[14]  F. B. Baulieu A classification of presence/absence based dissimilarity coefficients , 1989 .

[15]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[16]  Fionn Murtagh,et al.  Handbook of Cluster Analysis , 2015 .

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  James C. Bezdek,et al.  Extending Information-Theoretic Validity Indices for Fuzzy Clustering , 2017, IEEE Transactions on Fuzzy Systems.

[19]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[20]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[21]  J. V. Ness,et al.  Admissible clustering procedures , 1971 .

[22]  M. Warrens On Similarity Coefficients for 2×2 Tables and Correction for Chance , 2008, Psychometrika.

[23]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[24]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[25]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[26]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[27]  Pasi Fränti,et al.  Set Matching Measures for External Cluster Validity , 2016, IEEE Transactions on Knowledge and Data Engineering.