Relaxed triangle inequality ratio of the Sørensen-Dice and Tversky indexes

Abstract In this work, we calculate a tight relaxed triangle inequality ratio for some of the most well-known indexes used in finding dissimilarities between two finite sets known as the Sorensen–Dice and Tversky indexes. This relaxed triangle inequality ratio affects efficiency and approximation ratios of recent algorithms for many combinatorial problems such as traveling salesman and nearest neighbor search. Because of that, there are many works providing ratios for several other indexes. In this work, we focus on the Tversky index, which is a generalization of many dissimilarity indexes commonly used in practice. We provide the tight ratio of the Tversky index in this paper. Because the Sorensen–Dice index is a special case of the Tversky index, we know from the results that the tight ratio for the Sorensen–Dice index is equal to 1.5.

[1]  Vorapong Suppakitpaisarn,et al.  Semimetric Properties of Sørensen-Dice and Tversky Indexes , 2016, WALCOM.

[2]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[3]  Amit Kumar,et al.  A Simple D2-Sampling Based PTAS for k-Means and Other Clustering Problems , 2012, Algorithmica.

[4]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[5]  Stefan Senger,et al.  Using Tversky Similarity Searches for Core Hopping: Finding the Needles in the Haystack , 2009, J. Chem. Inf. Model..

[6]  Thomas Andreae,et al.  On the traveling salesman problem restricted to inputs satisfying a relaxed triangle inequality , 2001, Networks.

[7]  Nabil Belacel,et al.  NMR metabolic analysis of samples using fuzzy K‐means clustering , 2009, Magnetic resonance in chemistry : MRC.

[8]  Yin Yang,et al.  C-Cube: Elastic continuous clustering in the cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[9]  A. Tversky,et al.  Similarity, Separability, and the Triangle Inequality , 1982 .

[10]  Ming Zhu,et al.  Ontology-based Top-N Recommendations on New Items with Matrix Factorization , 2014, J. Softw..

[11]  András Schubert,et al.  A note on the Jaccardized Czekanowski similarity index , 2014, Scientometrics.

[12]  Maria Rifqi,et al.  Ranking Invariance Based on Similarity Measures in Document Retrieval , 2005, Adaptive Multimedia Retrieval.

[13]  Yang Xiang,et al.  Software Birthmark Similarity , 2012 .

[14]  Pavel Rychlý,et al.  A Lexicographer-Friendly Association Score , 2008, RASLAN.

[15]  C. Greg Plaxton,et al.  The Online Median Problem , 1999, SIAM J. Comput..

[16]  Sebastián Lozano,et al.  A methodological approach for designing and sequencing product families in Reconfigurable Disassembly Systems , 2011 .

[17]  Ragesh Jaiswal,et al.  Improved analysis of D2-sampling based PTAS for k-means and other clustering problems , 2015, Inf. Process. Lett..

[18]  P. Gartside,et al.  Near metric properties of function spaces , 2000, Fundamenta Mathematicae.

[19]  Remco C. Veltkamp,et al.  Shape matching: similarity measures and algorithms , 2001, Proceedings International Conference on Shape Modeling and Applications.

[20]  Ronald Fagin,et al.  Relaxing the Triangle Inequality in Pattern Matching , 2004, International Journal of Computer Vision.

[21]  Thomas Girke,et al.  Cheminformatic analysis of high-throughput compound screens. , 2014, Methods in molecular biology.

[22]  Michael A. Bender,et al.  Performance guarantees for the TSP with a parameterized triangle inequality , 2000, Inf. Process. Lett..

[23]  Yannis A. Tolias,et al.  Generalized fuzzy indices for similarity matching , 2001, Fuzzy Sets Syst..

[24]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[25]  Rafail Ostrovsky,et al.  Streaming k-means on well-clusterable data , 2011, SODA '11.

[26]  A. H. Lipkus A proof of the triangle inequality for the Tanimoto distance , 1999 .

[27]  Alexander F. Gelbukh,et al.  SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity , 2013, *SEMEVAL.

[28]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[29]  Marie-Jeanne Lesot,et al.  Similarity measures for binary and numerical data: a survey , 2008, Int. J. Knowl. Eng. Soft Data Paradigms.