Single-pass Parallel Prefix Scan with Decoupled Lookback
暂无分享,去创建一个
[1] Guy E. Blelloch,et al. Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.
[2] Andrew S. Grimshaw,et al. Allocation-oriented algorithm design with application to gpu computing , 2011 .
[3] Tack-Don Han,et al. Fast area-efficient VLSI adders , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).
[4] Guy E. Blelloch,et al. Scans as Primitive Parallel Operations , 1989, ICPP.
[5] Harold S. Stone,et al. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.
[6] H. T. Kung,et al. A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.
[7] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[8] Naga K. Govindaraju,et al. Fast scan algorithms on graphics processors , 2008, ICS '08.
[9] Jack Sklansky,et al. Conditional-Sum Addition Logic , 1960, IRE Trans. Electron. Comput..
[10] Leslie G. Valiant,et al. Universal circuits (Preliminary Report) , 1976, STOC '76.
[11] W. Daniel Hillis,et al. Data parallel algorithms , 1986, CACM.
[12] Leslie M. Goldschlager,et al. A universal interconnection pattern for parallel computers , 1982, JACM.
[13] Allan Borodin,et al. On Relating Time and Space to Size and Depth , 1977, SIAM J. Comput..
[14] Andrew S. Grimshaw,et al. Parallel Scan for Stream Architectures , 2012 .
[15] Marc Snir,et al. Depth-Size Trade-Offs for Parallel Prefix Computation , 1986, J. Algorithms.
[16] Shengen Yan,et al. StreamScan: fast scan algorithms for GPUs without global barrier synchronization , 2013, PPoPP '13.
[17] Tack-Don Han,et al. A Scalable Work-Efficient and Depth-Optimal Parallel Scan for the GPGPU Environment , 2013, IEEE Transactions on Parallel and Distributed Systems.
[18] Steven Fortune,et al. Parallelism in random access machines , 1978, STOC.
[19] Ralf Hinze. An Algebra of Scans , 2004, MPC.