A Correlation-Based Distance Function for Nearest Neighbor Classification

The Nearest Neighbor rule is a well-known classification me-thod largely studied in the pattern recognition community, both for its simplicity and its performance. The definition of the distance function is central for obtaining a good accuracy on a given data set and different distance functions have been proposed to increase the performance. This paper proposes a new distance function based on the correlation of fuzzy sets, called Fuzzy Correlation-based Difference Metric. The proposed distance function is a generalization of the Value Difference Metric and applies to both nominal and continuous attributes in a uniform way. Fuzzy sets are used to represent numeric attributes. A uninorm operator is used to aggregate local differences. Experimental results using an standard $\mathit{k}$-NN algorithm show a significant improvement in comparison to other distance functions proposed before.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  D. Dubois,et al.  Fundamentals of fuzzy sets , 2000 .

[3]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[4]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[5]  Ding-An Chiang,et al.  Correlation of fuzzy sets , 1999, Fuzzy Sets Syst..

[6]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Terry R. Payne,et al.  Implicit Feature Selection with the Value Difference Metric , 1998, ECAI.

[9]  Rami Zwick,et al.  Measures of similarity among fuzzy concepts: A comparative analysis , 1987, Int. J. Approx. Reason..

[10]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[11]  Konstantinos G. Margaritis,et al.  The MYCIN certainty factor handling function as uninorm operator and its use as a threshold function in artificial neurons , 1998, Fuzzy Sets Syst..

[12]  Hung T. Nguyen,et al.  Possibility Theory, Probability and Fuzzy Sets Misunderstandings, Bridges and Gaps , 2000 .

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[16]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..

[17]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[18]  Bernard De Baets,et al.  A Connectionist Fuzzy Case-Based Reasoning Model , 2006, MICAI.

[19]  Alexander Gelbukh,et al.  MICAI 2006: Advances in Artificial Intelligence, 5th Mexican International Conference on Artificial Intelligence, Apizaco, Mexico, November 13-17, 2006, Proceedings , 2006, MICAI.