Distance Preserving Mapping from Categories to Numbers for Indexing

Memory-Based Reasoning and K-Nearest Neighbor Searching are frequently adopted data mining techniques. But, they suffer from scalability. Indexing is a promising solution. However, it is difficult to index categorical attributes, since there does not exist linear ordering property among categories in a nominal attribute. In this paper, we proposed heuristic algorithms to map categories to numbers. Distance relationships among categories are preserved as many as possible. We empirically studied the performance of the algorithms under different distance situations.