The shift towards intrinsically distributed complex problem solving environments is prompting a need for new systems, which utilize the virtually unlimited data and computational resources of the Grid and at the same time hide all the related complexity from the user. Currently, there is no coherent framework, which offers data miners, who are usually not Grid experts, the ability to easily construct data mining tasks and execute them on the Grid. Therefore, there is a need to assemble a complete system that includes: a) a user-friendly environment for defining complex data mining tasks and b) a Grid middleware that supports execution of such tasks, while utilizing mechanisms for managing data and computational resources as well as having sophisticated job-monitoring capabilities. This paper will focus on the high-level design of such a system, which currently is being developed in the DataMiningGrid project with emphasis on the design and implementation of the resource broker service. We show how different resources from various domains can be exploited, in order to give the data mining researchers the ability to access and utilize resources needed for modern, distributed and computationally intensive data mining algorithms.
[1]
Steven Tuecke,et al.
The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration
,
2002
.
[2]
Ami Marowka,et al.
What is the GRID?
,
2002,
Scalable Comput. Pract. Exp..
[3]
Ian T. Foster,et al.
Globus Toolkit Version 4: Software for Service-Oriented Systems
,
2005,
Journal of Computer Science and Technology.
[4]
Ian Foster,et al.
The Globus toolkit
,
1998
.
[5]
Ian T. Foster,et al.
Data management and transfer in high-performance computational grid environments
,
2002,
Parallel Comput..
[6]
Vlado Stankovski,et al.
A Service-Centric Perspective for Data Mining in Complex Problem Solving Environments
,
2004,
PDPTA.
[7]
Donald F. Ferguson,et al.
The WS-Resource Framework
,
2004
.