A Kernel Testbed for Parallel Architecture, Language, and Performance Research

Computer scientists are in constant need for representative applications, which could guide them on how to evolve architectures, languages, and programming models for optimal performance, efficiency, and productivity. Unfortunately, this guidance is most often taken from existing software/hardware systems. Architects often focus on micro-architectural solutions which improve performance on fixed binaries locking in rather arbitrary codesequences as metric for success. Researchers tweak compilers to improve code generation for existing architectures and implementations, and they may invent new programming models for fixed processor and memory architectures and computational algorithms. In today’s rapidly evolving world of on-chip parallelism, these isolated and iterative improvements to performance may miss superior solutions in the same way gradient descent optimization techniques may get stuck in local minima. In an ongoing project at LBNL, we have developed an alternate approach that, rather than starting with an existing hardware/software solution laced with hidden assumptions, defines the computational problems of interest and invites architects, researchers and programmers to implement novel hardware/software co-designed solutions. Our work builds on the previous ideas of computational dwarfs, motifs, and parallel patterns by selecting a representative set of essential problems for which we provide: An algorithmic description; scalable problem definition; illustrative reference implementations; verification schemes; optimized sequential implementations. For simplicity, we focus initially on the computational problems of interest to the scientific computing community but proclaim the methodology (and perhaps a subset of the problems) as applicable to other communities. We intend to broaden the coverage of this problem space through stronger community involvement. Previous work has established a broad categorization of numerical methods of interest to the scientific computing, in the spirit of the NAS Benchmarks [3], which pioneered the basic idea of a “pencil and paper benchmark” in the 1990s. The initial result of the more modern study was the seven dwarfs, which was subsequently extended to 13 motifs[4, 1, 2]. These motifs have already been useful in defining classes of applications for architecturesoftware studies. However, these broad-brush problem statements often miss the nuance seen in individual kernels. For example, the computational requirements of particle methods vary greatly between the naive (but more accurate) direct calculations and the particle-mesh and particle-tree codes. Therefore we started our study with an enumeration of interesting, important, and non-trivial problems, but then proceeded by providing not only reference implementations for each problem, but more importantly a mathematical definition that allows one to escape iterative approaches to software/hardware optimization. To ensure long term value, we have augmented each of our reference implementations with both a scalable problem generator and a verification scheme. Additionally, we may provide an optimized reference implementation that provides insights into the bottlenecks on existing hardware and researcher’s optimizations to eliminate, hide, or mitigate them. One longterm goal of our project is to collect a diverse set of alternative implementations to enable a broad set of computer science research without pre-ordaining a single specific implementation. In a previous paper [5], we describe in detail this process of problem definition, scalable input creation, verification, and implementation of reference codes for the scientific computing domain. Table 1 enumerates and describes the level of support we’ve developed for each kernel. We group these important kernels using the Berkeley dwarfs/motifs taxonomy using a red box in the appropriate column. As kernels become progressively complex, they build upon other, simpler computational methods. We note this dependency via orange boxes.