A study of frameworks for collectively achieving the productivity, portability, and adoptability goals of parallel software

This dissertation targets the triple goal, of productivity, portability, and adoptability, by taking the position that infrastructure is needed that takes into account the full complexity of the real-world problem, and acts as a guide to make independent research efforts coherent and cumulative. The first part of this dissertation discusses the nature of the problem. It is not scholarly, due to the complexity and imprecision of the real-world aspects. It is best viewed as a first step in an iterative process: a try is made at informally modeling the problem, followed by proof-of-concept solutions aligned with that model, that motives the next iteration of more objective and thorough observations to create a better model, followed by better solutions guided by the new model, and so on. Such iteration is the standard approach to breaking cyclic dependencies, and this dissertation takes the position that it is a good time to try adding such informal models of real-world phenomena that affect productivity and adoption. This dissertation defines the mapping of application onto hardware as porting, and the subset of that related to performance as the specialization process. The analysis of parallel computation exposes what the specialization process needs as input to produce a highly efficient mapping. The parallel-computation model's main outcomes are the activities to be performed during specialization, and the consequent needs of specialization. The activities include identifying boundaries of parallel tasks in the application, modifying the structure of the application to make the resultant tasks and their communication patterns fit efficiently onto the hardware, inserting a runtime implementation tuned to the target hardware, and low-level optimizations of the task-code. These activities need the language to expose tasks, the dependencies between then, certain properties of the tasks, and to prevent scheduling code from appearing in the application. The second part of this dissertation goes on to relate two candidate frameworks that were designed in accordance with the model and analysis. Each candidate framework has a working proof-of-concept prototype. The first, called BLISS, is based on a centralized server that performs specialization, after development but before distributing the installable image. The second is based on the morphable hardware abstraction called VMS. VMS has no application-usable semantics, but rather accepts those in the form of a plugin, which makes it morphable. It is a direct replacement for the Thread hardware abstraction. And, it is suitable as the basis for a portability framework because it breaks the specialization process into three independent steps, allowing separate entities to perform those steps independently. Hence, no centralized server is needed, and the decomposition increases reuse of effort put into specialization tools. These frameworks provide a road for reusable tools to deliver performance portability of the widest possible range of applications, written in highly productive languages and environments. The frameworks fit various aspects of the software segments, addressing adoptability issues that have blocked uptake of other approaches. Rather than try to create the languages and specialization tools itself, the frameworks instead provide standard interfaces and methodology to support other, unconnected groups, to develop the languages and tools. Some initial tests have been performed with the frameworks to gain an indication of how well they might support a collective search. For the embedded segment, performance is top priority, and a poor performing framework would block adoptability, so performance numbers are given for proof-of-concept languages built on top of the frameworks. These show for each framework that it does not present a performance barrier to adoption. A modest productivity study for creating new languages based on VMS is also presented,and shows that the search for new parallel languages, such as embedded domain-specific languages, is accelerated by VMS. A proof-of-concept specialization server is demonstrated in BLISS, which shows that automated specialization can successfully take place in a centralized server. This in turn shows that BLISS provides the necessary conditions for specialization, and hence for portability. (Abstract shortened by UMI.)