Service-based Resource Brokering for Grid-Based Data Mining

The shift towards intrinsically distributed complex problem solving environments is prompting a need for new systems, which utilize the virtually unlimited data and computational resources of the Grid and at the same time hide all the related complexity from the user. Currently, there is no coherent framework, which offers data miners, who are usually not Grid experts, the ability to easily construct data mining tasks and execute them on the Grid. Therefore, there is a need to assemble a complete system that includes: a) a user-friendly environment for defining complex data mining tasks and b) a Grid middleware that supports execution of such tasks, while utilizing mechanisms for managing data and computational resources as well as having sophisticated job-monitoring capabilities. This paper will focus on the high-level design of such a system, which currently is being developed in the DataMiningGrid project with emphasis on the design and implementation of the resource broker service. We show how different resources from various domains can be exploited, in order to give the data mining researchers the ability to access and utilize resources needed for modern, distributed and computationally intensive data mining algorithms.