Efficient Data Mining: Scripting and Scalable Parallel Algorithms

This paper presents our approach to data mining that allows the coupling of parallel applications with a scripting language resulting in an efficient and flexible toolbox. Parallel algorithms which are scalable both in data size and number of processors are a key issue to be able to solve the ever increasing problems in data mining. On the other hand, data mining applications should be flexible to allow interactive data exploration. By using a toolbox written in a scripting language we are able to steer parallel applications in a flexible way, thus fulfilling the needs of a data miner for fast interactive data analysis. The chosen approach is discussed and first results are presented.