论文信息 - Experiments with List Ranking for Explicit Multi-Threaded (XMT) Instruction Parallelism

Experiments with List Ranking for Explicit Multi-Threaded (XMT) Instruction Parallelism

Algorithms for the problem of list ranking are empirically studied with respect to the Explicit Multi-Threaded (XMT) platform for instruction-level parallelism (ILP). The main goal of this study is to understand the differences between XMT and more traditional parallel computing implementation platforms/models as they pertain to the well studied list ranking problem. The main two findings are: (i) Good speedups for much smaller inputs are possible. (ii) In part, this finding is based on competitive performance by a new variant of a 1984 algorithm, called the No-Cut algorithm. The paper incorporates analytic (non-asymptotic) performance analysis into experimental performance analysis for relatively small inputs. This provides an interesting example where experimental research and theoretical analysis complement one another. 1 Explicit Multi-Threading (XMT) is a fine-grained computation framework introduced in our SPAA'98 paper. Building on some key ideas of parallel computing, XMT covers the spectrum from algorithms through architecture to implementation; the main implementation related innovation in XMT was through the incorporation of low-overhead hardware and software mechanisms (for more effective fine-grained parallelism). The reader is referred to that paper for detail on these mechanisms. The XMT platform aims at faster single-task completion time by way of ILP.

Uzi Vishkin | Shlomit Dascal

[1] David A. Patterson,et al. Computer Organization & Design: The Hardware/Software Interface , 1993 .

[2] Richard Cole,et al. Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[3] Gary L. Miller,et al. A Simple Randomized Parallel Algorithm for List-Ranking , 1990, Inf. Process. Lett..

[4] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .

[5] Jop F. Sibeyn,et al. Better trade-offs for parallel list ranking , 1997, SPAA '97.

[6] Jop F. Sibeyn,et al. Practical Parallel List Ranking , 1997, J. Parallel Distributed Comput..

[7] Margaret Reid-Miller,et al. List ranking and list scan on the Cray C-90 , 1994, SPAA '94.

[8] Uzi Vishkin,et al. Randomized speed-ups in parallel computation , 2015, STOC '84.

[9] Uzi Vishkin,et al. From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum , 1997, SPAA '97.

[10] Gary L. Miller,et al. Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[11] William J. Dally,et al. VLSI architecture: past, present, and future , 1999, Proceedings 20th Anniversary Conference on Advanced Research in VLSI.

[12] Tsan-sheng Hsu,et al. Efficient Massively Parallel Implementation of some Combinatorial Algorithms , 1996, Theor. Comput. Sci..

[13] Richard Cole,et al. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms , 1986, STOC '86.

[14] Uzi Vishkin,et al. Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract) , 1998, SPAA '98.