论文信息 - Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases

Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases

Efficiency is crucial in KDD (Knowledge Discovery in Databases), due to the huge amount of data stored in commercial databases. We argue that high efficiency in KDD can be achieved by combining two approaches, namely mapping KDD functionality onto standard DBMS operations and executing KDD tasks on a parallel SQL server. We propose generic KDD primitives which underly the candidate-rule evaluation procedures of many KDD algorithms, and we evaluate the speed up achieved by a parallel SQL server when executing a decision-tree learner algorithm implemented via these primitives.

Alex A. Freitas | Simon Lavington

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Arno Siebes,et al. Data Mining: the search for knowledge in databases. , 1994 .

[3] Marco Richeldi,et al. Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract) , 1995, ECML.

[4] Arun N. Swami,et al. Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5] Tomasz Imielinski,et al. An Interval Classifier for Database Mining Applications , 1992, VLDB.

[6] J. Kellett. London , 1914, The Hospital.

[7] Salvatore J. Stolfo,et al. A parallel and distributed environment for database rule processing: open problems and future directions , 1995 .

[8] Cao Feng,et al. STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[9] Marcel Holsheimer,et al. Data Surveyor: Searching the Nuggets in Parallel , 1996, Advances in Knowledge Discovery and Data Mining.

[10] Cullen Schaffer,et al. A Conservation Law for Generalization Performance , 1994, ICML.