Parallel Transposition of Sparse Data Structures
暂无分享,去创建一个
Weifeng Liu | Wu-chun Feng | Hao Wang | Kaixi Hou | Wu-chun Feng | Hao Wang | Weifeng Liu | Kaixi Hou
[1] Lars Karlsson,et al. Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion , 2012, TOMS.
[2] Franz Franchetti,et al. Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets , 2011, ICS '11.
[3] Shengen Yan,et al. StreamScan: fast scan algorithms for GPUs without global barrier synchronization , 2013, PPoPP '13.
[4] Pradeep Dubey,et al. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms , 2015, ISC.
[5] Frank Dellaert,et al. Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..
[6] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Ingemar J. Cox,et al. Dynamic Map Building for an Autonomous Mobile Robot , 1990, EEE International Workshop on Intelligent Robots and Systems, Towards a New Frontier of Applications.
[8] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[9] P. Sadayappan,et al. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs , 2014, ICS '14.
[10] Hiroshi Inoue,et al. SIMD- and Cache-Friendly Algorithm for Sorting an Array of Structures , 2015, Proc. VLDB Endow..
[11] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .
[12] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[13] Juan Gómez-Luna,et al. In-place transposition of rectangular matrices on accelerators , 2014, PPoPP '14.
[14] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[15] Roland W. Freund,et al. A Transpose-Free Quasi-Minimal Residual Algorithm for Non-Hermitian Linear Systems , 1993, SIAM J. Sci. Comput..
[16] Brian Vinter,et al. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors , 2015, Parallel Comput..
[17] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[18] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[19] References , 1971 .
[20] Patrick R. Amestoy,et al. An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..
[21] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .
[22] Richard Durbin,et al. Extending reference assembly models , 2015, Genome Biology.
[23] Robert E. Tarjan,et al. Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..
[24] Leonid Oliker,et al. HipMer: an extreme-scale de novo genome assembler , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] R. Fletcher. Conjugate gradient methods for indefinite systems , 1976 .
[26] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[27] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[28] David A. Bader,et al. GPU merge path: a GPU merging algorithm , 2012, ICS '12.
[29] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[30] Yves Robert,et al. STS-k: a multilevel sparse triangular solution scheme for NUMA multicores , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[31] David A. Bader,et al. Graph Partitioning and Graph Clustering , 2013 .
[32] Michael Garland,et al. A decomposition for in-place matrix transposition , 2014, PPoPP '14.
[33] Wu-chun Feng,et al. AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-Based Multi-and Many-Core Processors , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[34] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.
[35] Timothy A. Davis,et al. Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.
[36] Wu-chun Feng,et al. ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors , 2015, ICS.
[37] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[38] John Beidler,et al. Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.
[39] Ingemar J. Cox,et al. Dynamic Map Building for an Autonomous Mobile Robot , 1992 .
[40] Kunle Olukotun,et al. On fast parallel detection of strongly connected components (SCC) in small-world graphs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[41] Wu-chun Feng,et al. cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[42] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[43] R. Freund,et al. QMR: a quasi-minimal residual method for non-Hermitian linear systems , 1991 .
[44] Ulrich Meyer,et al. GPU multisplit , 2016, PPoPP.
[45] Brian Vinter,et al. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors , 2015, J. Parallel Distributed Comput..
[46] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[47] Rafael Asenjo,et al. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , 2016, PPOPP.
[48] Lawrence Rauchwerger,et al. Finding strongly connected components in distributed graphs , 2005, J. Parallel Distributed Comput..
[49] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.