DMGrid: A Data Mining System Based on Grid Computing

Researchers in the field of data mining now confront a common problem that data mining tasks are time-consuming in that these tasks have to process large-scale datasets. Grid computing focuses on integrating distributed, heterogeneous and idle computers from the Internet to be a service system with high performance. Thus, it is possible to take advantage of grid computing to provide high performance computation capability to effectively reduce task durations. Here, we have successfully developed DMGrid, a grid handling data mining applications. In DMGrid, it not only considers efficient parallel computing as a crucial aspect, but also takes into account dynamic resource configuration. Unlike many existing data mining grids, DMGrid also provides an engine to execute the algorithm flow specified in an application. Moreover, it offers application execution monitoring. At last, we perform experiments and design two applications: Customer Churning Analysis and Customer Value Analysis through which the feasibility of DMGrid is validated.

[1]  Yike Guo,et al.  An Architecture for Distributed Enterprise Data Mining , 1999, HPCN Europe.

[2]  Wu-Shan Jiang,et al.  Distributed data mining on the grid , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[3]  Rui Camacho,et al.  A commodity platform for Distributed Data Mining - the HARVARD System , 2006, ICDM.

[4]  Bin Wu,et al.  The Design of Data Mining Metadata Web Service Architecture Based on JDM in Grid Environment , 2006, 2006 First International Symposium on Pervasive Computing and Applications.

[5]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[6]  María S. Pérez-Hernández,et al.  Design and implementation of a data mining grid-aware architecture , 2007, Future Gener. Comput. Syst..

[7]  Mario Cannataro,et al.  Distributed data mining on grids: services, tools, and applications , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[9]  Jason Novotny,et al.  Data mining on NASA's Information Power Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[10]  Bin Wu,et al.  A Parallel Algorithm for Enumerating All Maximal Cliques in Complex Network , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Huadong Ma,et al.  A Temporal Logic Based Grid Workflow Model and Scheduling Scheme , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[12]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[13]  Amihai Motro,et al.  VirtuE: a formal model of virtual enterprises for information markets , 2006, Journal of Intelligent Information Systems.