Reverse Engineering Databases for Knowledge Discovery

Many data mining tools cannot be used directly to analyze the complex sets of relations which are found in large database systems. In our experience, data miners rely on a well-defined data model, or the knowledge of a data expert, to isolate and extract candidate data sets prior to mining the data. For many databases, typically large legacy systems, a reliable data model is often unavailable and access to the data expert can be limited. In this paper we use reverse engineering techniques to infer a model of the database. Reverse engineering a database can be seen as knowledge discovery in its own right and the resulting data model may be made available to data mining tools as background knowledge. In addition, minable data sets can be produced from the inferred data model and analyzed using conventional data mining tools. Our approach reduces the data miner's reliance on a well-defined data model and the data expert.