Experiments with List Ranking for Explicit Multi-Threaded (XMT) Instruction Parallelism

Algorithms for the problem of list ranking are empirically studied with respect to the Explicit Multi-Threaded (XMT) platform for instruction-level parallelism (ILP). The main goal of this study is to understand the differences between XMT and more traditional parallel computing implementation platforms/models as they pertain to the well studied list ranking problem. The main two findings are: (i) Good speedups for much smaller inputs are possible. (ii) In part, this finding is based on competitive performance by a new variant of a 1984 algorithm, called the No-Cut algorithm. The paper incorporates analytic (non-asymptotic) performance analysis into experimental performance analysis for relatively small inputs. This provides an interesting example where experimental research and theoretical analysis complement one another. 1 Explicit Multi-Threading (XMT) is a fine-grained computation framework introduced in our SPAA'98 paper. Building on some key ideas of parallel computing, XMT covers the spectrum from algorithms through architecture to implementation; the main implementation related innovation in XMT was through the incorporation of low-overhead hardware and software mechanisms (for more effective fine-grained parallelism). The reader is referred to that paper for detail on these mechanisms. The XMT platform aims at faster single-task completion time by way of ILP.

[1]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[2]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[3]  Gary L. Miller,et al.  A Simple Randomized Parallel Algorithm for List-Ranking , 1990, Inf. Process. Lett..

[4]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[5]  Jop F. Sibeyn,et al.  Better trade-offs for parallel list ranking , 1997, SPAA '97.

[6]  Jop F. Sibeyn,et al.  Practical Parallel List Ranking , 1997, J. Parallel Distributed Comput..

[7]  Margaret Reid-Miller,et al.  List ranking and list scan on the Cray C-90 , 1994, SPAA '94.

[8]  Uzi Vishkin,et al.  Randomized speed-ups in parallel computation , 2015, STOC '84.

[9]  Uzi Vishkin,et al.  From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum , 1997, SPAA '97.

[10]  Gary L. Miller,et al.  Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[11]  William J. Dally,et al.  VLSI architecture: past, present, and future , 1999, Proceedings 20th Anniversary Conference on Advanced Research in VLSI.

[12]  Tsan-sheng Hsu,et al.  Efficient Massively Parallel Implementation of some Combinatorial Algorithms , 1996, Theor. Comput. Sci..

[13]  Richard Cole,et al.  Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms , 1986, STOC '86.

[14]  Uzi Vishkin,et al.  Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract) , 1998, SPAA '98.