The Calculation of Similarity and Its Application in Data Mining

The Similarity is a measure, which is used to measure the strength of the relationship between two objects and their closely degree. According to different object types, similarity calculation method is also different. Similarity calculation is widely used in classifing data, it is the basis of object classification. In this paper, the data objects were divided into three kinds: numerical type, non numeric type and mixed type. And these similarity calculation methods of different types are discussed. Finally, we illustrated the application of similarity in the data classification and data cluster.

[1]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  O. Uchida,et al.  Greedy network-growing by Minkowski distance functions , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  R.M.C.R. de Souza,et al.  Dynamic clustering of interval data based on adaptive Chebyshev distances , 2004 .

[5]  Sheng-Yi Jiang Efficient Classification Method for Large Dataset , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[6]  Lei Wang,et al.  A Scalable Algorithm for Learning a Mahalanobis Distance Metric , 2009, ACCV.

[7]  Q. Ye The signed Euclidean distance transform and its applications , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[8]  Lei Wang,et al.  Scalable Large-Margin Mahalanobis Distance Metric Learning , 2010, IEEE Transactions on Neural Networks.

[9]  P. Danielsson Euclidean distance mapping , 1980 .

[10]  H.J. Mattausch,et al.  Fully-parallel pattern-matching engine with dynamic adaptability to Hamming or Manhattan distance , 2002, 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302).

[11]  Gwo-Hshiung Tzeng,et al.  Multiple Objective Compromise Optimization Method to Analyze the Strategies of Nanotechnology in Taiwan , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[12]  Deng Guan-nan The Similarity Measure in Clustering , 2013 .

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Ian H. Witten,et al.  Chapter 1 – What's It All About? , 2011 .

[15]  Hans Jurgen Mattausch,et al.  Associative memory with fully parallel nearest-Manhattan-distance search for low-power real-time single-chip applications , 2004 .

[16]  Jiang Sheng-yi An Enhanced k-means Clustering Algorithm , 2006 .

[17]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[18]  Subir Chowdhury,et al.  The Mahalanobis-taguchi System , 2000 .

[19]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.