Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model
暂无分享,去创建一个
Pranith Kumar | Hyesoon Kim | Minjang Kim | Bevin Brett | Hyesoon Kim | Minjang Kim | Pranith Kumar | Bevin R. Brett
[1] O. Mutlu,et al. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.
[2] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[3] Peng Wu,et al. Compiler-Driven Dependence Profiling to Guide Program Parallelization , 2008, LCPC.
[4] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[5] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[6] Stijn Eyerman,et al. Modeling critical sections in Amdahl's law and its implications for multicore design , 2010, ISCA '10.
[7] Alan H. Karp,et al. Measuring parallel processor performance , 1990, CACM.
[8] Hyesoon Kim,et al. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[9] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[10] Saturnino Garcia,et al. Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.
[11] Hsien-Hsin S. Lee,et al. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] Michael Frumkin,et al. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .
[13] Jian Li,et al. Memory Latency Reduction via Thread Throttling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[14] James R. Larus,et al. Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..
[15] Vassilios V. Dimakopoulos,et al. A Microbenchmark Study of OpenMP Overheads under Nested Parallelism , 2008, IWOMP.
[16] Mary K. Vernon,et al. Parallel program performance prediction using deterministic task graph analysis , 2004, TOCS.
[17] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[18] Xiangyu Zhang,et al. Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.
[19] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[20] Mingyu Chen,et al. Extending Amdahl's law in the multicore era , 2009, SIGMETRICS Perform. Evaluation Rev..
[21] William Thies,et al. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[22] J. Mark Bull,et al. A microbenchmark suite for OpenMP 2.0 , 2001, CARN.
[23] Rajesh Bordawekar,et al. Modeling optimistic concurrency using quantitative dependence analysis , 2008, PPOPP.