An Algorithm for Automatic Generation of a Case Base from a Database Using Similarity-Based Rough Approximation

Knowledge acquisition for a case-based reasoning system from domain experts is a bottleneck in the system development process. In recent years, huge amounts of data in many areas have become available. Therefore, deriving representative cases from available databases rather than from domain experts is feasible and promising. This paper presents an algorithm to derive cases automatically from available databases. This algorithm is based on the similarity-based rough set theory. It can tackle inconsistent data and select a reasonable number of the representative cases from a database. This algorithm was implemented in Java and the experiment results indicate that in some conditions the classification accuracy of the derived case base can be superior to some well-known data mining systems, such as rule induction systems and neural network systems.