Improving parallelism and locality with asynchronous algorithms
暂无分享,去创建一个
[1] Gérard M. Baudet,et al. Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.
[2] John N. Tsitsiklis,et al. Convergence rate and termination of asynchronous iterative algorithms , 1989, ICS '89.
[3] D. Szyld,et al. Asynchronous two-stage iterative methods , 1994 .
[4] William Pugh,et al. Exploiting Monotone Convergence Functions in Parallel Programs , 1996, LCPC.
[5] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[6] Yuan Shi,et al. Timing Models and Local Stopping Criteria for Asynchronous Iterative Algorithms , 1999, J. Parallel Distributed Comput..
[7] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[8] Craig C. Douglas,et al. A Tutorial on Elliptic Pde Solvers and Their Parallelization , 2003 .
[9] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[10] Jingling Xue,et al. Code tiling for improving the cache performance of PDE solvers , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[11] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.
[12] Sadaf R. Alam,et al. Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.
[13] Volker Strumpen,et al. The memory behavior of cache oblivious stencil computations , 2007, The Journal of Supercomputing.
[14] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[15] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[16] Yuan Shi,et al. Parallel Processing of Linear Systems Using Asynchronous Iterative Algorithms , 2007 .
[17] Zhiyuan Li,et al. ASYNC Loop Constructs for Relaxed Synchronization , 2008, LCPC.
[18] Lixia Liu,et al. Analyzing memory access intensity in parallel programs on multicore , 2008, ICS '08.
[19] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[20] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.