论文信息 - Shared resource management for efficient heterogeneous computing - 字舞流文

Shared resource management for efficient heterogeneous computing

[1] Nam Sung Kim,et al. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[2] Gary S. Tyson,et al. A modified approach to data cache management , 1995, MICRO 1995.

[3] Jens Sparsø,et al. A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[4] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[5] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[6] Daniel A. Jiménez,et al. Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[7] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[8] Kees G. W. Goossens,et al. Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip , 2003, DATE.

[9] Yan Solihin,et al. Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[10] John Paul Shen,et al. Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[12] Chi-Ying Tsui,et al. Optimal link scheduling on improving best-effort and guaranteed services performance in network-on-chip systems , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[13] Chita R. Das,et al. Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[14] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[15] Tobias Bjerregaard,et al. A survey of research and practices of Network-on-chip , 2006, CSUR.

[16] Sudhakar Yalamanchili,et al. Accelerating simulation of agent-based models on heterogeneous architectures , 2013, GPGPU@ASPLOS.

[17] Jian Li,et al. Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[18] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[19] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[20] Yi Yang,et al. CPU-assisted GPGPU on fused CPU-GPU architectures , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[21] Krste Asanovic,et al. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[22] Wei Huang,et al. Processor-Memory Power Shifting for Multi-Core Systems , 2012 .

[23] Chita R. Das,et al. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[24] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25] Jongman Kim,et al. Virtualizing Virtual Channels for Increased Network-on-Chip Robustness and Upgradeability , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[26] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.

[27] Sherief Reda,et al. Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28] Sudhakar Yalamanchili,et al. Interconnection Networks: An Engineering Approach , 2002 .

[29] Jian Li,et al. Power-efficient time-sensitive mapping in heterogeneous systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30] Kees Goossens,et al. AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[31] Lei Gao,et al. A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[32] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[33] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34] Francisco J. Cazorla,et al. MLP-Aware Dynamic Cache Partitioning , 2008, HiPEAC.

[35] Onur Mutlu,et al. Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[36] Onur Mutlu,et al. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[37] Bishop Brock,et al. Architecting for power management: The IBM® POWER7™ approach , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[38] Kees G. W. Goossens,et al. dAElite: A TDM NoC Supporting QoS, Multicast, and Fast Connection Set-Up , 2014, IEEE Transactions on Computers.

[39] G. Edward Suh,et al. A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[40] Yvon Jégou,et al. Using virtual lines to enhance locality exploitation , 1994, ICS '94.

[41] Mahmut T. Kandemir,et al. PEPON: Performance-aware hierarchical power budgeting for NoC based multicores , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[42] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[43] Li Shang,et al. Multi-Optimization power management for chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[44] Karthick Rajamani,et al. Power-performance management on an IBM POWER7 server , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[45] Kees G. W. Goossens,et al. Aelite: A flit-synchronous Network on Chip with composable and predictable services , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[46] Naehyuck Chang,et al. Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[47] John Kim,et al. Throughput-Effective On-Chip Networks for Manycore Accelerators , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[48] Natalie D. Enright Jerger,et al. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[49] Sudhakar Yalamanchili,et al. Cooperative boosting: needy versus greedy power management , 2013, ISCA.

[50] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[51] Gary S. Tyson,et al. Region-based caching: an energy-delay efficient memory architecture for embedded processors , 2000, CASES '00.

[52] Xuejun Yang,et al. IPC-Based Cache Partitioning: An IPC-Oriented Dynamic Shared Cache Partitioning Mechanism , 2008, 2008 International Conference on Convergence and Hybrid Information Technology.

[53] Kai Ma,et al. DPPC: Dynamic power partitioning and capping in chip multiprocessors , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[54] Kai Ma,et al. PGCapping: Exploiting power gating for power capping and core lifetime balancing in CMPs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[55] Russell Tessier,et al. ASOC: a scalable, single-chip communications architecture , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[56] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[57] Sudhakar Yalamanchili,et al. Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..

[58] Hiroshi Sasaki,et al. Coordinated power-performance optimization in manycores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[59] Yusuf Leblebici,et al. Quantitative modelling and comparison of communication schemes to guarantee quality-of-service in networks-on-chip , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[60] Chita R. Das,et al. Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[61] Tor M. Aamodt,et al. Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[62] Cédric Augonnet,et al. Data-Aware Task Scheduling on Multi-accelerator Based Platforms , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[63] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[64] Kai Ma,et al. Scalable power control for many-core architectures running multi-threaded applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[65] Karthick Rajamani,et al. A performance-conserving approach for reducing peak power consumption in server systems , 2005, ICS '05.

[66] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[67] Mattan Erez,et al. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.

[68] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[69] Efraim Rotem,et al. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[70] José L. Sánchez,et al. Exploring NoC Virtualization Alternatives in CMPs , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[71] Margaret Martonosi,et al. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[72] Radu Marculescu,et al. Analysis and optimization of prediction-based flow control in networks-on-chip , 2008, TODE.

[73] Gabriel H. Loh,et al. Scalable Shared-Cache Management by Containing Thrashing Workloads , 2010, HiPEAC.

[74] Théodore Marescaux,et al. Introducing the SuperGT Network-on-Chip; SuperGT QoS: more than just GT , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[75] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[76] Norman P. Jouppi,et al. Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[77] Fang Liu,et al. Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[78] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[79] Hsien-Hsin S. Lee,et al. Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning , 2003, ISLPED '03.

[80] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[81] Yuval Tamir,et al. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches , 1992, IEEE Trans. Computers.

[82] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[83] Kees G. W. Goossens,et al. Networks on silicon: combining best-effort and guaranteed services , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[84] Kees G. W. Goossens,et al. A Design Flow for Application-Specific Networks on Chip with Guaranteed Performance to Accelerate SOC Design and Verification , 2005, Design, Automation and Test in Europe.

[85] Wolf-Dietrich Weber,et al. A quality-of-service mechanism for interconnection networks in system-on-chips , 2005, Design, Automation and Test in Europe.

[86] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .

[87] T. Sakurai,et al. Run-time voltage hopping for low-power real-time systems , 2000, Proceedings 37th Design Automation Conference.

[88] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[89] Fabien Clermidy,et al. An asynchronous NOC architecture providing low latency service and its multi-level design framework , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[90] K.J. Nesbit,et al. AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[91] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.

[92] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[93] Mahmut T. Kandemir,et al. SHARP control: Controlled shared cache management in chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[94] R. Marculescu,et al. Exploiting the routing flexibility for energy/performance aware mapping of regular NoC architectures , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[95] José Duato,et al. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks , 2005, 11th International Symposium on High-Performance Computer Architecture.

[96] Natalie D. Enright Jerger,et al. DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[97] Hao Wang,et al. Workload and power budget partitioning for single-chip heterogeneous processors , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[98] Christine A. Shoemaker,et al. Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[99] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.

[100] Michael J. Schulte,et al. ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[101] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[102] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[103] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[104] Ran Ginosar,et al. An asynchronous router for multiple service levels networks on chip , 2005, 11th IEEE International Symposium on Asynchronous Circuits and Systems.

[105] Mainak Chaudhuri,et al. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[106] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[107] Mainak Chaudhuri,et al. Bypass and insertion algorithms for exclusive last-level caches , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[108] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[109] Soraya Ghiasi,et al. Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[110] Ran Ginosar,et al. QNoC: QoS architecture and design process for network on chip , 2004, J. Syst. Archit..

[111] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[112] Axel Jantsch,et al. Guaranteed bandwidth using looped containers in temporally disjoint networks within the nostrum network on chip , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[113] Jing Li,et al. Fast lock scheme for phase-locked loops , 2009, 2009 IEEE Custom Integrated Circuits Conference.

[114] Russell Tessier,et al. An architecture and compiler for scalable on-chip communication , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[115] Kai Ma,et al. Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[116] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[117] Axel Jantsch,et al. Load distribution with the proximity congestion awareness in a network on chip , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[118] R. Marculescu,et al. Traffic analysis for on-chip networks design of multimedia applications , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[119] Kevin Kai-Wei Chang,et al. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[120] Kaushik Roy,et al. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[121] Timothy Mark Pinkston,et al. Evaluation of queue designs for true fully adaptive routers , 2004, J. Parallel Distributed Comput..

[122] Radu Marculescu,et al. DyAD - smart routing for networks-on-chip , 2004, Proceedings. 41st Design Automation Conference, 2004..

[123] Hsien-Hsin S. Lee,et al. Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors , 2008, ASPLOS.

[124] G. Edward Suh,et al. Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[125] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[126] Chita R. Das,et al. A case for heterogeneous on-chip interconnects for CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).