An Algebra of Scans

A parallel prefix circuit takes n inputs x 1, x 2, ..., x n and produces the n outputs x 1, x 1 ∘ x 2, ..., x 1 ∘ x 2 ∘ ⋯ ∘ x n , where’∘’ is an arbitrary associative binary operation. Parallel prefix circuits and their counterparts in software, parallel prefix computations or scans, have numerous applications ranging from fast integer addition over parallel sorting to convex hull problems. A parallel prefix circuit can be implemented in a variety of ways taking into account constraints on size, depth, or fan-out. Traditionally, implementations are either defined graphically or by enumerating the underlying graph. Both approaches have their pros and cons. A figure if well drawn conveys the possibly recursive structure of the scan but it is not amenable to formal manipulation. A description in form of a graph while rigorous obscures the structure of a scan and is equally hard to manipulate. In this paper we show that parallel prefix circuits enjoy a very pleasant algebra. Using only two basic building blocks and four combinators all standard designs can be described succinctly and rigorously. The rules of the algebra allow us to prove the circuits correct and to derive circuit designs in a systematic manner.

[1]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[2]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[3]  Ralf Hinze,et al.  Constructing Red−Black Trees , 1999 .

[4]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[5]  Sergei Gorlatch,et al.  (De) composition rules for parallel scan and reduction , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[6]  Yen-Chun Lin,et al.  A new approach to constructing optimal prefix circuits with small depth , 2002, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'02.

[7]  Sergei Gorlatch,et al.  Extracting and Implementing List Homomorphisms in Parallel Program Development , 1999, Sci. Comput. Program..

[8]  Gudula Rünger,et al.  Derivation of a logarithmic time carry lookahead addition circuit , 2004, J. Funct. Program..

[9]  J. L. Smith,et al.  A One-Microsecond Adder Using One-Megacycle Circuitry , 1956, IRE Trans. Electron. Comput..

[10]  Faith E. Fich,et al.  New Bounds for Parallel Prefix Circuits , 1983, STOC.

[11]  Jayadev Misra,et al.  Powerlist: a structure for parallel recursion , 1994, TOPL.

[12]  Rob R. Hoogerwoord,et al.  A Logarithmic Implementation of Flexible Arrays , 1992, MPC.

[13]  H. T. Kung,et al.  The chip complexity of binary arithmetic , 1980, STOC '80.

[14]  Kenneth R. Gold APL: A Programming Language. , 1970 .