论文信息 - Parallel programming languages for collections

Parallel programming languages for collections

The thesis discusses the design, expressive power, and implementation of parallel programming languages for collections, the fragment dealing with collections of an object-oriented query language. The Relational Algebra has a simple, intrinsic parallel semantics, which enabled the successful development of parallel relational database systems. But the implementation techniques of these systems do not carry over to the more complex object-oriented databases. In order to develop efficient parallel object-oriented database systems, one needs to (1) design their query languages with parallelism in mind, and (2) find new implementation techniques, specially designed for these languages. Here we pursue these goals for parallel languages for collections. The collections of interest for us are sets, bags, and sequences (lists). We start by describing a basic collection calculus and additional forms of recursion on collections. They have an idealized parallel "execution", assuming unbounded resources and instant communication, which gives us high-level parallel complexity measures. An interesting fragment of the calculus expresses exactly the queries in the parallel complexity class NC. Here the salient construct is divide and conquer recursion on sets. Sub-languages obtained by imposing a bound k on the number of recursion nesting correspond to the subclasses $AC\sp{k}$, for $k \ge$ 1. We break the implementation of the calculus into three steps. First, sets and bags are implemented on sequences, using high-level parallel algorithms: we express such algorithms in a high-level language for sequences called ${\cal MAP}$, built around a new form of recursion. Second, we describe a complexity-preserving compilation of ${\cal MAP}$ on a simple vector-parallel model. Third, we implement the vector model on a parallel multiprocessor. Here we choose as target the LogP model, which can be instantiated to simulate various multiprocessors. All but one of the vector model instructions require only restricted forms of communication patters on LogP, called monotone communications. These in turn admit efficient implementations on LogP. We ran two simple benchmarks on a LogP simulator, measuring the speedup and the scaleup. We report conditions under which good speedup and scaleup can be expected.

Dan Suciu