论文信息 - Reducing network contention with mixed workloads on modern multicore, clusters

Reducing network contention with mixed workloads on modern multicore, clusters

Multi-core systems are now extremely common in modern clusters. In the past commodity systems may have had up to two or four CPUs per compute node. In modern clusters, these systems still have the same number of CPUs, however, these CPUs have moved from single-core to quad-core and further advances are imminent. To obtain the best performance, compute nodes in a cluster are connected with high-performance interconnects. On nearly all clusters, the number of network interfaces is the same on current multi-core systems as in the past when there were fewer cores per node. Although these networks have increased bandwidth with the shift to multi-core, there still exists severe network contention for some application patterns. In this work we propose mixed workload (non-exclusive) scheduling of jobs to increase network efficiency and reduce contention. As a case-study we use Message Passing Interface (MPI) programs on the InfiniBand interconnect. We show through detailed profiling of the network that accesses of the network and CPU of some applications are complementary to each other and lead to increased network efficiency and overall application performance improvement. We show improvements of 20% and more for some of the NAS Parallel Benchmarks on quad-socket, quad-core AMD systems.

Dhabaleswar K. Panda | Miao Luo | Matthew J. Koop

[1] Jochen Liedtke,et al. Preliminary thoughts on memory-bus scheduling , 2000, EW 9.

[2] Jason Cong,et al. Synthesis of reconfigurable high-performance multicore systems , 2009, FPGA '09.

[3] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .

[4] Hiroshi Nakamura,et al. Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS , 2007, CARN.

[5] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6] Hridesh Rajan,et al. Predictive thread-to-core assignment on a heterogeneous multi-core processor , 2007, PLOS '07.

[7] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .

[8] Allan Snavely,et al. User-guided symbiotic space-sharing of real workloads , 2006, ICS '06.

[9] Interner Bericht. VAMPIR: Visualization and Analysis of MPI Resources , 1996 .

[10] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[11] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[12] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..