Dissecting sequential programs for parallelization—An approach based on computational units
暂无分享,去创建一个
Felix Wolf | Ali Jannesari | Rohit Atre | Zia Ul Huda | F. Wolf | A. Jannesari | Rohit Atre | Z. Huda
[1] Xiangyu Zhang,et al. Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.
[2] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[3] Zhen Li,et al. Unveiling parallelization opportunities in sequential programs , 2016, J. Syst. Softw..
[4] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[5] David I. August,et al. Automatically exploiting cross-invocation parallelism using runtime information , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[6] Zhen Li,et al. Discovery of Potential Parallelism in Sequential Programs , 2013, 2013 42nd International Conference on Parallel Processing.
[7] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[8] Frederica Darema,et al. The SPMD Model : Past, Present and Future , 2001, PVM/MPI.
[9] Arthur J. Bernstein,et al. Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..
[10] Koen De Bosschere,et al. A profile-based tool for finding pipeline parallelism in sequential programs , 2010, Parallel Comput..
[11] Ian T. Foster,et al. Compiler Techniques for Massively Scalable Implicit Task Parallelism , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Felix Wolf,et al. Using Template Matching to Infer Parallel Design Patterns , 2015, ACM Trans. Archit. Code Optim..
[13] Wilson C. Hsieh,et al. Automatic generation of nested, fork-join parallelism , 2004, The Journal of Supercomputing.
[14] Michael Allen,et al. Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .
[15] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[16] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[17] Zhen Li,et al. An Efficient Data-Dependence Profiler for Sequential and Parallel Programs , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[18] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[19] Rainer Leupers,et al. MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[20] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[21] P. O. Bobbie. Partitioning programs for parallel execution : A case study in the Intel iPSC/2 environment , 1997 .
[22] V. Sarkar,et al. Automatic partitioning of a program dependence graph into parallel tasks , 1991, IBM J. Res. Dev..
[23] Chi Ching Chi,et al. A Benchmark Suite for Evaluating Parallel Programming Models: Introduction and Preliminary Results , 2011 .
[24] Felix Wolf,et al. Brief Announcement: Meeting the Challenges of Parallelizing Sequential Programs , 2017, SPAA.
[25] Chi Ching Chi,et al. A Benchmark Suite for Evaluating Parallel Programming Models , 2011 .
[26] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[27] Felix Wolf,et al. Automatic Parallel Pattern Detection in the Algorithm Structure Design Space , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[28] James Reinders,et al. Intel® threading building blocks , 2008 .
[29] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[30] Philippe Clauss,et al. Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[31] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[32] Felix Wolf,et al. The Basic Building Blocks of Parallel Tasks , 2015, COSMIC@CGO.