FRPA: A Framework for Recursive Parallel Algorithms
暂无分享,去创建一个
James Demmel | Armando Fox | Omer Spillinger | David Eliahu | J. Demmel | A. Fox | Omer Spillinger | D. Eliahu
[1] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[2] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[3] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[4] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[5] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[6] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[7] Richard Cole,et al. Resource Oblivious Sorting on Multicores , 2010, ICALP.
[8] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[9] Geppino Pucci,et al. Network-Oblivious Algorithms , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[10] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[11] Shoaib Kamil,et al. Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization , 2011, SciPy.
[12] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[13] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[14] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[16] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[17] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[18] Leonidas J. Guibas,et al. Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams , 1983, STOC.
[19] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[20] Guy E. Blelloch,et al. Programming parallel algorithms , 1996, CACM.
[21] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.