A Coarse-Grained Parallel Genetic Algorithm for Clustered Document Allocation in Multiprocessor Information Retrieval Systems
暂无分享,去创建一个
This paper presents a new coarse-grained parallel genetic algorithm for the document allocation problem in multiprocessor information retrieval (IR) systems. The objective is to find an optimal mapping of a clustered collection of documents onto the multiple processing nodes such that the average cluster diameter is kept to a minimum while providing for an even document distribution across the nodes. In this paper, we prove the given problem to be NP-complete and describe a heuristic algorithm that supports efficient access to the clustered document allocation. Our parallel genetic algorithm is based on a hybrid of the island and the neighborhood models in which the distributed population structure has the advantage of inherent parallelism and thus reducing the possibility of premature convergence. The parallel algorithm has been developed for a distributed-memory multiprocessor IR system, and the performance was evaluated. We empirically investigate the effects of varying the distribution of documents across the clusters, the impacts of the data skewness, and the effects of replicating documents at different nodes in the multiprocessor IR system. As part of the experimental analysis, we also study the impacts of varying the system parameters, the migration period, the migration volume, the probability of mutation, and the population size. We present our experimental observation, including the solution quality of allocation and the scale-up speedup and time, for a behavioral evaluation.