Scalable and efficient implementation of 3d unstructured meshes computation: a case study on matrix assembly
暂无分享,去创建一个
[1] Ellis Horowitz,et al. Divide-and-Conquer for Parallel Processing , 1983, IEEE Transactions on Computers.
[2] Guy E. Blelloch,et al. Programming parallel algorithms , 1996, CACM.
[3] Ryan Newton,et al. A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops , 2011, IEEE Software.
[4] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[5] Marcin Paprzycki,et al. Assembling recursively stored sparse matrices , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.
[6] Sanjay J. Patel,et al. WAYPOINT: scaling coherence to thousand-core architectures , 2010, PACT '10.
[7] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[8] Charbel Farhat,et al. A general approach to nonlinear FE computations on shared-memory multiprocessors , 1989 .
[9] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[10] Laura Grigori,et al. Parallel design and performance of nested filtering factorization preconditioner , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Paul H. J. Kelly,et al. International Conference on Computational Science , ICCS 2012 Hybrid OpenMP / MPI anisotropic mesh smoothing , 2012 .
[12] Pradeep Dubey,et al. Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.
[14] Alex Pothen,et al. ColPack: Software for graph coloring and related problems in scientific computing , 2013, TOMS.
[15] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.
[16] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[17] Eric Petit,et al. Divide and Conquer Parallelization of Finite Element Method Assembly , 2013, PARCO.
[18] Marcin Paprzycki,et al. Use of hybrid recursive CSR/COO data structures in sparse matrix-vector multiplication , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.
[19] Leonid Oliker,et al. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations , 2013, SIAM Rev..
[20] Eric Darve,et al. Assembly of finite element methods on graphics processors , 2011 .
[21] David A. Ham,et al. Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .
[22] Bruno Raffin,et al. Cache-Efficient Parallel Isosurface Extraction for Shared Cache Multicores , 2010, EGPGV@Eurographics.
[23] William Gropp,et al. Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .
[24] Victor Eijkhout,et al. Recursive approach in sparse matrix LU factorization , 2001, Sci. Program..
[25] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..