CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs
暂无分享,去创建一个
Bingsheng He | Weifeng Liu | Feng Zhang | Xiaoyong Du | Rujia Wang | Ruofan Wu | Jiya Su | Bingsheng He | Rujia Wang | Xiaoyong Du | Ruofan Wu | Weifeng Liu | Feng Zhang | Jiya Su
[1] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Yogesh L. Simmhan,et al. GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.
[3] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[4] Kamesh Madduri,et al. Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .
[6] Wei Zhang,et al. ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..
[7] Keshav Pingali,et al. Optimistic parallelism requires abstractions , 2007, PLDI '07.
[8] Shuaiwen Song,et al. Locality-Driven Dynamic GPU Cache Bypassing , 2015, ICS.
[9] Edmond Chow,et al. Iterative Sparse Triangular Solves for Preconditioning , 2015, Euro-Par.
[10] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[11] Xiang Pan,et al. Using STT-RAM to enable energy-efficient near-threshold chip multiprocessors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[12] Yousef Saad,et al. Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..
[13] Jack J. Dongarra,et al. Feedback-directed thread scheduling with memory considerations , 2007, HPDC '07.
[14] David A. Bader,et al. Approximating Betweenness Centrality , 2007, WAW.
[15] Weifeng Liu,et al. Efficient Block Algorithms for Parallel Sparse Triangular Solve , 2020, ICPP.
[16] Xinyu Li,et al. Hierarchical Hybrid Memory Management in OS for Tiered Memory Systems , 2019, IEEE Transactions on Parallel and Distributed Systems.
[17] Lizy Kurian John,et al. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Weifeng Liu,et al. Parallel and Scalable Sparse Basic Linear Algebra Subprograms , 2016 .
[19] Yousef Saad,et al. GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.
[20] Joseph L. Greathouse,et al. Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).
[21] Brian Vinter,et al. Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides , 2017, Concurr. Comput. Pract. Exp..
[22] Pablo Ezzatti,et al. Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).
[23] Brian Vinter,et al. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves , 2016, Euro-Par.
[24] Eric C. Kerrigan,et al. Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).
[25] Andrei Z. Broder,et al. Workshop on Algorithms and Models for the Web Graph , 2007, WAW.
[26] Mehmet Deveci,et al. Sparse Matrix-Matrix Multiplication for Modern Architectures , 2016 .
[27] Weifeng Liu,et al. Parallel Transposition of Sparse Data Structures , 2016, ICS.
[28] Brian Vinter,et al. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors , 2015, J. Parallel Distributed Comput..
[29] Pradeep Dubey,et al. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.
[30] Padma Raghavan,et al. Adapting Sparse Triangular Solution to GPUs , 2012, 2012 41st International Conference on Parallel Processing Workshops.
[31] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[32] Mingsong Chen,et al. OO-VR: NUMA Friendly Object-Oriented VR Rendering Framework For Future NUMA-Based Multi-GPU Systems , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[33] Xinyu Li,et al. Thinking about A New Mechanism for Huge Page Management , 2019, APSys '19.
[34] Mircea R. Stan,et al. Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[35] Weifeng Liu,et al. swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures , 2018, PPoPP.
[36] Xian-He Sun,et al. DaCache: Memory Divergence-Aware GPU Cache Management , 2015, ICS.
[37] Yogesh L. Simmhan,et al. Distributed Programming over Time-Series Graphs , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[38] Brian Vinter,et al. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors , 2015, Parallel Comput..
[39] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[40] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[41] Bingsheng He,et al. Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[42] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .
[43] Joel H. Saltz,et al. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..
[44] Santa Clara,et al. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU , 2011 .
[45] Surendra Byna,et al. Core-aware memory access scheduling schemes , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[46] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[47] Xiaoyong Du,et al. Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors , 2019, CCF Transactions on High Performance Computing.
[48] Yinglong Xia,et al. C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework , 2018, ICPP.
[49] Jack J. Dongarra,et al. L2 Cache Modeling for Scientific Applications on Chip Multi-Processors , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).
[50] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.