Correlation Weighted Heterogeneous Euclidean-Overlap Metric

Abstract Many data mining algorithms depend on a good distance function to be successful. Among large numbers of distance functions, Heterogeneous Euclidean-Overlap Metric (simply HEOM) is the simplest but effective distance function to handle the applications with both continuous and nominal attributes. In order to scale up its generalization performance, we present an improved HEOM by correlation weighting. We call our improved HEOM correlation weighted Heterogeneous Euclidean-Overlap Metric (simply CWHEOM) in this paper. In CWHEOM, to discrete and continuous class problems, we apply different correlation functions to estimate the correlation between attribute variables and class variable. Experiments running on 36 discrete class data sets and 36 continuous class data sets validate its effectiveness.

[1]  D. Randall Wilson,et al.  Advances in instance-based learning algorithms , 1997 .

[2]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[4]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[6]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[7]  David J. Hand,et al.  The multi-class metric problem in nearest neighbour discrimination rules , 1990, Pattern Recognit..

[8]  Ron Kohavi,et al.  The Utility of Feature Weighting in Nearest-Neighbor Algorithms , 1997 .

[9]  Francesco Ricci,et al.  Probability Based Metrics for Nearest Neighbor Classification and Case-Based Reasoning , 1999, ICCBR.

[10]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[11]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[12]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[13]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[16]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[17]  Bin Wang,et al.  Probability Based Metrics for Locally Weighted Naive Bayes , 2007, Canadian Conference on AI.

[18]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[19]  William H. Press,et al.  Numerical recipes in C , 2002 .

[20]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[21]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[22]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[23]  L. Jiang,et al.  Scaling Up the Accuracy of K-Nearest-Neighbour Classifiers: A Naive-Bayes Hybrid , 2009 .