论文信息 - Improving Parallelism in Structural Data Mining

Improving Parallelism in Structural Data Mining

Large amount of data collected daily requires efficient algorithms for its processing. The SUBDUE data mining system discovers substructures in structurally complex data, based on the minimum description length principle. Its parallel implementation, MPI-SUBDUE, was created in 2001 to reduce computation time and/or to deal with larger datasets. In this paper, a new, more efficient implementation of MPI-SUBDUE is introduced. The experimental results show that, for the mutagenesis dataset, the new implementation outperforms the original one by up to 33% and that the performance gain increases with the number of processors used.

Marcin Paprzycki | Min Cai | Istvan Jonyer

[1] Sergei Gorlatch,et al. Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.

[2] George Karypis,et al. A Software Package for Partitioning Unstructured Graphs , Partitioning Meshes , and Computing Fill-Reducing Orderings of Sparse Matrices Version 5 . 0 , 1998 .

[3] Lawrence B. Holder,et al. Approaches to Parallel Graph-Based Knowledge Discovery , 2001, J. Parallel Distributed Comput..

[4] George Karypis,et al. Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[5] Lawrence B. Holder,et al. Improving Scalability in a Scientific Discovery System by Exploiting Parallelism , 1997, KDD.

[6] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..