A method for (1) an efficient discovery of data in large distributed raw datasets and (2) collection of thus procured data is considered. It is a pure peer-to-peer method without any centralized control and is therefore primarily intended for a large-scale, dynamic (data)grid environments. It provides a simple but highly efficient mechanism for keeping the load it causes under control and proves especially usefull if data discovery and collection is to be performed simultaneoulsy with dataset generation. The method supports a user-specified extraction of structured metadata from raw datasets, and automatically performs aggregation of extracted metadata. It is based on the principle of ant colony optimization (ACO). The paper is focused on effective data aggregation and includes the detailed description of the modifications of the basic ACO algorithm that are needed for effective aggregation of the extracted data. Using a simulator, the method was vigorously tested on the wide set of different network topologies for different rates of data extraction and aggregation. Results of the most significant tests are included.
[1]
Manuel López-Ibáñez,et al.
Ant colony optimization
,
2010,
GECCO '10.
[2]
Bostjan Slivnik,et al.
The complexity of static data replication in data grids
,
2005,
Parallel Comput..
[3]
Marco Dorigo,et al.
Ant-Based Clustering and Topographic Mapping
,
2006,
Artificial Life.
[4]
Jean-Louis Deneubourg,et al.
The dynamics of collective sorting robot-like ants and ant-like robots
,
1991
.
[5]
Francine Berman,et al.
Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality
,
2003
.
[6]
Ian T. Foster,et al.
The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets
,
2000,
J. Netw. Comput. Appl..