Resource management in distributed systems

In academia and industry, there is renewed interest in using clusters of workstations for high performance tasks as an alternative to tightly coupled parallel monoliths. There are several reasons for this. Firstly, monolithic machines are expensive and dedicated, whereas clusters are relatively cheap and general purpose. Secondly, the 64-bit RISC technology boosts the performance per workstation-node to figures comparable to dedicated monolith nodes. Moreover, new software technologies provide better programming environments like PVM, MPI and resource management tools such as Codine and Condor. The major reason for the upsurge in interest, however, seems to be the increasing network bandwidth supporting fast and reliable communication between nodes in a cluster environment. For example, stable installations with 100,150 and some even with 622 Mbit/s ATM-SONET Local Area Networks were reported by Tolmie at the HPCN95 Europe conference in Milan. In the Supercomputing ‘95 conference, an experimental networking project was announced (the :so-called ‘I-way’) aiming at a 622 Mbit/s network connecting 40 major institutes in the USA. One of the major questions remaining is what the consequences are for parallel applications that are highly tuned to tightly-coupled parallel systems. This is not merely a question of portability, but also one of more fundamental differences in execution behaviour, especially when issues such as load balancing are considered. The major differences stem from the fact that cluster computing implies a dynamic (and often heterogeneous) computer resource as opposed to the static (homogeneous) resources available in monolithic computing. In this special issue, you will find invited papers on new developments in the field of PVM (Parallel Virtual Machine), the de-facto standard in message passing programming environments and also in the field of resource management for distributed systems. Moreover, you will find papers describing new approaches to combine PVM-like message passing environments with distributed resource management systems. I expect that this integration of techniques and paradigms will be able to bridge the gap between distributed and parallel computing and that it will stimulate the use of parallelism in both academia and industry. In the first paper, M. Bernaschi explores the basic issues which need to be addressed to obtain efficient implementations of PVM. His experiences, gained through the PVMe project, are a guideline for anyone interested in aspects of run-time systems support when porting message-passing libraries. The next paper, by A. Ciampolini and C. Stefanelli, reports on the porting of PVM to a Meiko CS such that a user can distribute efficiently any application partly on ‘traditional’ PVM nodes and partly on the Meiko CS nodes. This heterogeneous PVM supports uniform process allocation and inter-process communication transparency. The third paper, by one of the developers of PVM-V.S. Sunderam-and by S.A. Moyer, addresses the issue of parallel input