论文信息 - Distance Metrics for Instance-Bsed Learning

Distance Metrics for Instance-Bsed Learning

Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearest neighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

Steven Salzberg | S. Salzberg

[1] David L. Waltz,et al. Toward memory-based reasoning , 1986, CACM.

[2] Douglas L. Medin,et al. Context theory of classification learning. , 1978 .

[3] David W. Aha,et al. Noise-Tolerant Instance-Based Learning Algorithms , 1989, IJCAI.

[4] Pat Langley,et al. Models of Incremental Concept Formation , 1990, Artif. Intell..

[5] P. Bennett,et al. Diabetes mellitus in American (Pima) Indians. , 1971, Lancet.

[6] Richard S. Johannes,et al. Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[7] A. Tversky,et al. Similarity, Separability, and the Triangle Inequality , 1982 .

[8] Steven L. Salzberg,et al. Exemplar-Based Learning to Predict Protein Folding , 1990 .

[9] R. Nosofsky. American Psychological Association, Inc. Choice, Similarity, and the Context Theory of Classification , 2022 .

[10] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[11] O. Mangasarian,et al. Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[12] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.