Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases

Efficiency is crucial in KDD (Knowledge Discovery in Databases), due to the huge amount of data stored in commercial databases. We argue that high efficiency in KDD can be achieved by combining two approaches, namely mapping KDD functionality onto standard DBMS operations and executing KDD tasks on a parallel SQL server. We propose generic KDD primitives which underly the candidate-rule evaluation procedures of many KDD algorithms, and we evaluate the speed up achieved by a parallel SQL server when executing a decision-tree learner algorithm implemented via these primitives.