Network-Aware Distributed Computing: A Case Study

The development of network-aware applications, i.e. applications that dynamically adapt to network conditions, has had some success in the domain of multimedia applications, but progress has been very slow for distributed computing applications. The reason is that the relationship between application performance and network performance is typically more complex for that class of applications, making adaptation difficult. In this paper we introduce two adaptation methods for distributed computing applications, one based on a performance model and another based on balancing computation and communication time. We illustrate the two methods using a simple distributed application (matrix multiply) and compare their performance. We show that both methods can correctly estimate the best number of nodes to use on our testbed. We also show that both methods have weaknesses. Model-based adaptation requires an accurate performance model and is sensitive to errors in measurements of the system parameters. The ratio-based method is more robust but less general.