Data mining workflow templates for intelligent discovery assistance in RapidMiner

Knowledge Discovery in Databases (KDD) has evolved during the last years and reached a mature stage offering plenty of operators to solve complex tasks. User support for building workflows, in contrast, has not increased proportionally. The large number of operators available in current KDD systems make it difficult for users to successfully analyze data. Moreover, workflows easily contain a large number of operators and parts of the workflows are applied several times, thus it is hard for users to build them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we address these issues by introducing a knowledge-based representation of KDD workflows as a basis for cooperative-interactive planning. Moreover, we discuss workflow templates that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. We show that workflows can be grouped in templates enabling re-use and simplifying KDD worflow construction in RapidMiner.