CATA: Criticality Aware Task Acceleration for Multicore Processors
暂无分享,去创建一个
Eduard Ayguadé | Mateo Valero | Jesús Labarta | Ramón Beivide | Marc Casas | Miquel Moretó | Rosa M. Badia | Kallia Chronaki | José Luis Bosque | Lluc Alvarez | Enrique Vallejo | Emilio Castillo
[1] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Eduard Ayguadé,et al. Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures , 2015, ICS.
[3] Margaret Martonosi,et al. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.
[4] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[5] Radu Teodorescu,et al. Mitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Units , 2012, IEEE Computer Architecture Letters.
[6] Eduard Ayguadé,et al. Runtime-Aware Architectures: A First Approach , 2014, Supercomput. Front. Innov..
[7] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[8] Dimitrios S. Nikolopoulos,et al. A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[9] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[10] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[11] Margaret Martonosi,et al. Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.
[12] Meeta Sharma Gupta,et al. System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[13] Ulrich Kremer,et al. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.
[14] Per Stenström,et al. Efficient Forwarding of Producer-Consumer Data in Task-Based Programs , 2013, 2013 42nd International Conference on Parallel Processing.
[15] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[16] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[17] Vivek Sarkar,et al. Chunking parallel loops in the presence of synchronization , 2009, ICS.
[18] Francisco J. Cazorla,et al. Software-Controlled Priority Characterization of POWER5 Processor , 2008, 2008 International Symposium on Computer Architecture.
[19] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[20] Margaret Martonosi,et al. Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[21] Christoforos E. Kozyrakis,et al. Dynamic management of TurboMode in modern multi-core chips , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[22] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.
[23] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[24] Stefanos Kaxiras,et al. Introducing DVFS-Management in a Full-System Simulator , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.
[25] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Onur Mutlu,et al. Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.
[27] Henry Hoffmann,et al. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments , 2010, ICAC '10.
[28] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[29] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[30] David A. Patterson,et al. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.
[31] Xiang Pan,et al. Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[32] Francisco J. Cazorla,et al. Making data prefetch smarter: Adaptive prefetching on POWER7 , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[33] José González,et al. Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.
[35] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[36] Stijn Eyerman,et al. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.
[37] Eduard Ayguadé,et al. PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..
[38] Stijn Eyerman,et al. Fine-grained DVFS using on-chip regulators , 2011, TACO.
[39] Scott A. Mahlke,et al. Embracing heterogeneity with dynamic core boosting , 2014, Conf. Computing Frontiers.
[40] Per Stenström,et al. Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[41] Yale N. Patt,et al. MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] Dionisios N. Pnevmatikatos,et al. Prefetching and cache management using task lifetimes , 2013, ICS '13.
[43] Mateo Valero,et al. Runtime Aware Architectures , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[44] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[45] Trevor Mudge,et al. Reevaluating Fast Dual-Voltage Power Rail Switching Circuitry , 2012 .
[46] Pradip Bose,et al. Crank it up or dial it down: Coordinated multiprocessor frequency and folding control , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[47] Christoforos E. Kozyrakis,et al. Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.
[48] Stefanos Kaxiras,et al. Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.