Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures
暂无分享,去创建一个
[1] Nectarios Koziris,et al. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[2] Andreas Frommer,et al. Block colouring schemes for the SOR method on local memory parallel computers , 1990, Parallel Comput..
[3] Ümit V. Çatalyürek,et al. A framework for scalable greedy coloring on distributed-memory parallel computers , 2008, J. Parallel Distributed Comput..
[4] Leland L. Beck,et al. Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.
[5] Y. Saad. Numerical Methods for Large Eigenvalue Problems , 2011 .
[6] Kivanc Dincer,et al. A Comparison of Parallel Graph Coloring Algorithms , 1995 .
[7] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[8] Michael B. Giles,et al. Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines , 1997 .
[9] Mark T. Jones,et al. A Parallel Graph Coloring Heuristic , 1993, SIAM J. Sci. Comput..
[10] I. Duff,et al. The effect of ordering on preconditioned conjugate gradients , 1989 .
[11] Hiroshi Nakashima,et al. Algebraic Block Multi-Color Ordering Method for Parallel Multi-Threaded Sparse Triangular Solver in ICCG Method , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[12] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[13] Vicente H. F. Batista,et al. Parallel structurally-symmetric sparse matrix-vector products on multi-core processors , 2010, ArXiv.
[14] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[15] Nectarios Koziris,et al. Combining HTM with RCU to Speed Up Graph Coloring on Multicore Platforms , 2018, ISC.
[16] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[17] Nectarios Koziris,et al. Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[18] Udo W. Pooch,et al. A Survey of Indexing Techniques for Sparse Matrices , 1973, CSUR.
[19] Ümit V. Çatalyürek,et al. Graph coloring algorithms for multi-core and massively multithreaded architectures , 2012, Parallel Comput..
[20] Jack Dongarra,et al. The TOP500: History, Trends, and Future Directions in High Performance Computing , 2020 .
[21] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[22] Nectarios Koziris,et al. BASMAT: bottleneck-aware sparse matrix-vector multiplication auto-tuning on GPGPUs , 2019, PPoPP.
[23] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[24] Ümit V. Çatalyürek,et al. Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..
[25] Leonid Oliker,et al. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations , 2013, SIAM Rev..
[26] Assefaw Hadish Gebremedhin,et al. Scalable parallel graph coloring algorithms , 2000, Concurr. Pract. Exp..
[27] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[28] Joel H. Saltz,et al. ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .
[29] Maria Ganzha,et al. Utilizing Recursive Storage in Sparse Matrix-Vector Multiplication - Preliminary Considerations , 2010, CATA.
[30] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[31] Chuck Pheatt,et al. Intel® threading building blocks , 2008 .
[32] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..
[33] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[34] Mark T. Jones,et al. Parallel Heuristics for Improved, Balanced Graph Colorings , 1996, J. Parallel Distributed Comput..
[35] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.
[36] D. J. A. Welsh,et al. An upper bound for the chromatic number of a graph and its application to timetabling problems , 1967, Comput. J..
[37] Nectarios Koziris,et al. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors , 2017, 2017 46th International Conference on Parallel Processing (ICPP).
[38] Francisco F. Rivera,et al. Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs , 2012, Microprocess. Microsystems.
[39] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[40] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[41] Nectarios Koziris,et al. SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms , 2018, ACM Trans. Math. Softw..
[42] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[43] Michele Martone,et al. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format , 2014, Parallel Comput..
[44] Charles E. Leiserson,et al. Ordering heuristics for parallel graph coloring , 2014, SPAA.
[45] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[46] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[47] Richard P. Brent,et al. The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.
[48] Pavel Tvrdík,et al. Evaluation Criteria for Sparse Matrix Storage Formats , 2016, IEEE Transactions on Parallel and Distributed Systems.
[49] Nectarios Koziris,et al. Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.
[50] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[51] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[52] Gerhard Wellein,et al. A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication , 2019, ACM Trans. Parallel Comput..
[53] Nectarios Koziris,et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.