A Four Stage Process and Four Element Model

Knowledge Discovery in Databases (KDD) is the process of extracting novel information and knowledge from large databases. This process consists of many interacting stages performing specific data manipulation and transformation operations with an information flow from one stage onto the next (and often back into previous stages). The process can be very complex and may exhibit much variety in the context of the variety tasks undertaken within KDD. In this paper we characterise our experiences of the KDD process and formalise its key elements in a model. A case study of insurance risk analysis for policy premium setting is used to illustrate the process and the model. The model provides a framework for comparing and differentiating various approaches to KDD.