The hidden cost of low bandwidth communication

Every year manufacturers of massively parallel computers release new machines with ever more impressive benchmark performance. Unfortunately, these machines are not widely used in practice. To date, they have been used eeciently mostly to solve problems whose communication structure is well understood. Furthermore, the software for solving these problems has been highly engineered, and both data and machine speciic. Progress has been made recently on solving more irregular problems and machine-independent programming languages such as HPF have made software more portable, but the machines remain diicult to program. Our position is that much of this diiculty stems from a lack of appreciation of the impact of low performance interconnect on software development. To understand why the cost of low performance interconnect is underestimated, it helps to look at the standards by which parallel computers have been measured until now. Five years ago, a typical measure of the power of a massively parallel computer was the peak rate at which it could perform oating-point operations, i.e., how many gigaaops ((oating point operations per second) it could perform. Manufacturers produced machines that achieved very high oating point performance, but this measure was found to be nonpredictive because the machines could rarely achieve it. More recently, the performance of these machines has been judged against a set of standard benchmarks, which include the LINPACK and NAS 1] benchmarks. As shown in Figure 1, these benchmarks, especially the NAS, have exposed a large diierence between peak and achievable performance. One of the conclusions that can be drawn from these benchmarks is that machines with high communication bandwidth perform well across the board, whereas peak oating-point performance is relevant only on embarrassingly parallel problems. The benchmarks, however, still do not reveal the full cost of inadequate comunications bandwidth. Absent from the performance statistics is the cost of software development. In particular the statistics fail to capture 1. the time to write the code, 2. the time to write the compiler that translates the code, 3. the time to port the code to diierent distributions of data, and 4. the time to port to diierent machines. These hidden costs can be substantial. First, although the NAS benchmarks are relatively simple , manufacturers of parallel machines typically spend multiple man years of software development in order to draw high performance from their machines. Second, the time reported for the development of code for speciic benchmarks is often …