Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs
暂无分享,去创建一个
Stefanos Kaxiras | Per Ekemark | Alexandra Jimborean | Georgios Zacharopoulos | Konstantinos Koukos | Vasileios Spiliopoulos | S. Kaxiras | G. Zacharopoulos | K. Koukos | Vasileios Spiliopoulos | A. Jimborean | Per Ekemark
[1] Sharad Malik,et al. Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.
[2] A. Jaleel. Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .
[3] Stefanos Kaxiras,et al. Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.
[4] Margaret Martonosi,et al. Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.
[5] Tomofumi Yuki,et al. Folklore Confirmed: Compiling for Speed = Compiling for Energy , 2013, LCPC.
[6] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[7] Vincent Loechner,et al. Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons , 2013, International Journal of Parallel Programming.
[8] Josep Torrellas,et al. An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.
[9] David Black-Schaffer,et al. Towards more efficient execution: a decoupled access-execute approach , 2013, ICS '13.
[10] Xipeng Shen,et al. Space-efficient multi-versioning for input-adaptive feedback-driven program optimizations , 2014, OOPSLA.
[11] MirchandaneyRavi,et al. Run-Time Parallelization and Scheduling of Loops , 1991 .
[12] Stijn Eyerman,et al. A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.
[13] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[14] Barbara G. Ryder,et al. Blended analysis for performance understanding of framework-based applications , 2007, ISSTA '07.
[15] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[16] R.H. Dennard,et al. Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.
[17] Martin Hirzel,et al. Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.
[18] K. Steinhubl. Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .
[19] Yale N. Patt,et al. Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[20] David Black-Schaffer,et al. Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling , 2014, CGO '14.
[21] Dean M. Tullsen,et al. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.
[22] Shigeru Chiba,et al. A New Optimization Technique for the Inspector-Executor Method , 2002, IASTED PDCS.
[23] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.
[24] Martin Hirzel,et al. Bursty Tracing: A Framework for Low-Overhead Temporal Profiling , 2001 .
[25] Grigori Fursin,et al. Finding representative sets of optimizations for adaptive multiversioning applications , 2009, ArXiv.
[26] Matthias Hauswirth,et al. Low-overhead memory leak detection using adaptive statistical profiling , 2004, ASPLOS XI.
[27] Lingjia Tang,et al. Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[28] Vincent Loechner,et al. VMAD: A virtual machine for advanced dynamic analysis of programs , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[29] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[30] Stéphan Jourdan,et al. Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.
[31] Matthew Arnold,et al. A framework for reducing the cost of instrumented code , 2001, PLDI '01.
[32] Juan Touriño,et al. An Inspector-Executor Algorithm for Irregular Assignment Parallelization , 2004, ISPA.
[33] P. Sadayappan,et al. A Compiler Analysis to Determine Useful Cache Size for Energy Efficiency , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[34] T. K. Prakash,et al. Performance Characterization of SPEC CPU 2006 Benchmarks on Intel Core 2 Duo Processor , .
[35] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[36] Nancy M. Amato,et al. Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.
[37] Xipeng Shen,et al. An input-centric paradigm for program dynamic optimizations , 2010, OOPSLA.
[38] Vincent Loechner,et al. VMAD: An Advanced Dynamic Program Analysis and Instrumentation Framework , 2012, CC.
[39] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[40] Satish Narayanasamy,et al. LiteRace: effective sampling for lightweight data-race detection , 2009, PLDI '09.
[41] Stefanos Kaxiras,et al. Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.
[42] Weifeng Zhang,et al. Accelerating and Adapting Precomputation Threads for Effcient Prefetching , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[43] Xuan Chen,et al. Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning , 2009, 2009 15th International Conference on Parallel and Distributed Systems.