Distributing large data to many nodes, known as a broadcast or a multicast, is an important operation in parallel and distributed computing. Most previous broadcast algorithms explicitly or implicitly try to deliver data to all nodes in the same rate. This assumption is reasonable for homogeneous environments where all nodes have similar receiving capabilities. However, when nodes have various receiving capabilities, nodes with slow-receiving capabilities slow down the entire receiving bandwidth in these algorithms. In such settings, each node desires to receive data at its largest possible bandwidth and to start computation as soon as it receives the data. In this paper, we propose to say a broadcast is stable when the bandwidth to a node is never sacrificed by the presence of other, possibly slow, receiving nodes, and proposes the stability as a desired property of broadcast algorithms. In addition, we show a simple and efficient stable broadcast when the topology among nodes is a tree and each link has a symmetric bandwidth. This work improves upon previously proposed algorithms such as FPFR and Balanced Multicasting. For general graphs, it outperforms them when the network is heterogeneous and for trees, our algorithm is proved to be stable and optimal. In a real environment with 100 machines in 4 clusters, our scheme achieved 2.1 to 2.6 times aggregate bandwidth compared to the best result in the other algorithms. We also demonstrated the stability by adding a slow node to a broadcast. Some simulations also showed that our algorithm also performs well in many bandwidth distributions.
[1]
Bronis R. de Supinski,et al.
Exploiting hierarchy in parallel computer networks to optimize collective operation performance
,
2000,
Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[2]
Thilo Kielmann,et al.
MOB: zero-configuration high-throughput multicasting for grid applications
,
2007,
HPDC '07.
[3]
Ludmila Cherkasova,et al.
FastReplica: Efficient Large File Distribution Within Content Delivery Networks
,
2003,
USENIX Symposium on Internet Technologies and Systems.
[4]
Rauf Izmailov,et al.
Fast Parallel File Replication in Data Grid
,
2004
.
[5]
Gilles Fedak,et al.
Scheduling independent tasks sharing large data distributed with BitTorrent
,
2005,
The 6th IEEE/ACM International Workshop on Grid Computing, 2005..
[6]
Hideo Saito,et al.
A fast topology inference: a building block for network-aware parallel processing
,
2007,
HPDC '07.
[7]
Ruay-Shiung Chang,et al.
A Novel Data Grid Coherence Protocol Using Pipeline-Based Aggressive Copy Method
,
2007,
GPC.
[8]
Rayadurgam Srikant,et al.
Modeling and performance analysis of BitTorrent-like peer-to-peer networks
,
2004,
SIGCOMM 2004.