Enhancing Preprocessing in Data-Intensive Domains using Online-Analytical Processing

The application of data mining algorithms needs a goal-oriented pre-processing of the data. In practical applications the preprocessing task is very time consuming and has an important influence on the quality of the generated models. In this paper we describe a new approach for data preprocessing. Combining database technology with classical data mining systems using an OLAP engine as interface we outline an architecture for OLAP-based preprocessing that enables interactive and iterative processing of data. This high level of interaction between human and database system enables efficient understanding and preparing of data for building scalable data mining applications. Our case study taken from the data-intensive telecommunication domain applies the proposed methodology for deriving user communication profiles. These user profiles are given as input to data mining algorithms for clustering cutomers with similar behavior.