An information theoretic similarity-based learning method for databases

Similarity-based learning has been widely and successfully used in some domains. Despite these successes, most similarity measures used in the current literature are defined on limited feature types. Therefore, these similarity measures cannot be applied to the database environment due to the variety of data types that exist. In this paper, we propose a new method of similarity-based learning for databases using information theory. The current similarity measures are improved in several ways. Similarity is defined on every attribute type in the database, and each attribute is assigned a weight depending on its importance with respect to the target attribute. Besides, our nearest neighbor algorithm gives different weights to the selected instances. Our system is implemented and tested on some typical machine learning databases. Our experiments show that the classification accuracy of our system is, in general, superior to that of other learning methods.<<ETX>>