Data Mining and Knowledge Discovery — an Overview

The emerging of data mining and knowledge discovery in databases (KDD) as a new technology is due to the fast development and wide application of information and database technologies. With the increasing use of databases the need to be able to digest large volumes of data being generated is now critical. It is estimated that only 5%-10% of commercial databases have ever been analysed [23]. As Massey and Newing [24] indicated that database technology has been successful in recording and managing data but failed in the sense of moving from data processing to making it a key strategic weapon for enhancing business competition. The large volume and high dimensionality of databases leads to the breakdown of traditional human analysis. Data mining and KDD is aimed at developing methodologies and tools to automate the data analysis process and create useful information and knowledge from data to help in decision making (Figure 2.1). A widely accepted definition is given by Fayyad et al. [25] in which KDD is defined as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The definition regards KDD as a complicated process comprising a number of steps and data mining is one step in the process.