Taxonomy of nominal type histogram distance measures

Distance or similarity measures are of fundamental importance to pattern classification, clustering, and information retrieval problems. Various distance/similarity measures that are applicable to compare two nominal type histograms are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted to reveal similarities among numerous distance/similarity measures.

[1]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[2]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[3]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[4]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[5]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[6]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[7]  J. Neyman Contribution to the Theory of the {χ superscript 2} Test , 1949 .

[8]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[9]  K. Matusita Decision Rules, Based on the Distance, for Problems of Fit, Two Samples, and Estimation , 1955 .

[10]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[11]  J. Looman,et al.  Adaptation of Sorensen's K (1948) for Estimating Unit Affinities in Prairie Vegetation , 1960 .

[12]  森下 Measuring of interspecific association and similarity between communities. , 1961 .

[13]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[14]  R. Sibson Information radius , 1969 .

[15]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Godfried T. Toussaint,et al.  Bibliography on estimation of misclassification , 1974, IEEE Trans. Inf. Theory.

[18]  Terry Hedges,et al.  An empirical modification to linear wave theory , 1977 .

[19]  E. Krause,et al.  Taxicab Geometry: An Adventure in Non-Euclidean Geometry , 1987 .

[20]  Inder Jeet Taneja,et al.  On Generalized Information Measures and Their Applications , 1989 .

[21]  B. Kumar,et al.  Performance measures for correlation filters. , 1990, Applied optics.

[22]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[23]  I. J. Taneja New Developments in Generalized Information Measures , 1995 .

[24]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[25]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[26]  Sung-Hyuk Cha,et al.  On measuring the distance between histograms , 2002, Pattern Recognit..

[27]  D. Gavin,et al.  A statistical approach to evaluating distance metrics and analog assignments for pollen records , 2003, Quaternary Research.

[28]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[29]  O. E. Polansky,et al.  Introduction to Similarity Searching in Chemistry , 2004 .

[30]  Pranesh Kumar,et al.  ON A SYMMETRIC DIVERGENCE MEASURE AND INFORMATION INEQUALITIES , 2005 .

[31]  Pavel Zezula,et al.  Similarity Search: The Metric Space Approach (Advances in Database Systems) , 2005 .

[32]  Sung-Hyuk Cha,et al.  Enhancing Binary Feature Vector Similarity Measures , 2006 .

[33]  Elena Deza,et al.  Dictionary of distances , 2006 .

[34]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[35]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .