Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach
暂无分享,去创建一个
Huawei Li | Lieven Eeckhout | Olivier Temam | Yang Chen | Zidong Du | Yunji Chen | Chengyong Wu | Yuntan Fang | Yuanjie Huang | Shuangde Fang
[1] Paolo Ienne,et al. Elastic CGRAs , 2013, FPGA '13.
[2] Luca Benini,et al. Component selection and matching for IP-based design , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.
[3] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[4] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Zhen Wang,et al. Reflex: using low-power processors in smartphones without knowing them , 2012, ASPLOS XVII.
[6] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[7] K. McStay,et al. Scaling deep trench based eDRAM on SOI to 32nm and Beyond , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[8] Martin C. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.
[9] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.
[10] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[11] Lieven Eeckhout,et al. Iterative optimization for the data center , 2012, ASPLOS XVII.
[12] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[13] Andrew Richards,et al. Programmability and performance portability aspects of heterogeneous multi-/manycore systems , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[14] George T. Heineman,et al. Component-Based Software Engineering: Putting the Pieces Together , 2001 .
[15] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[16] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[18] R. Dolbeau,et al. HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .
[19] Mark D. Corner,et al. Eon: a language and runtime system for perpetual systems , 2007, SenSys '07.
[20] Alan Edelman,et al. Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[21] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[22] Olivier Temam,et al. A defect-tolerant accelerator for emerging high-performance applications , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[23] Henry Hoffmann,et al. Power Optimization in Embedded Systems via Feedback Control of Resource Allocation , 2013, IEEE Transactions on Control Systems Technology.
[24] Fei Xie,et al. Component-Based Hardware/Software Co-Simulation , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).
[25] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[26] Martin Rinard,et al. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .
[27] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[28] Zoltán Ádám Mann,et al. Extending component-based design with hardware components , 2005, Sci. Comput. Program..
[29] Lingamneni Avinash,et al. Highly energy and performance efficient embedded computing through approximately correct arithmetic: a mathematical foundation and preliminary experimental validation , 2008, CASES '08.
[30] Lingamneni Avinash,et al. Energy parsimonious circuit design through probabilistic pruning , 2011, 2011 Design, Automation & Test in Europe.
[31] Julio Gonzalo,et al. A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.
[32] David A. Padua,et al. Performance Portability with the Chapel Language , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[33] Arcot Sowmya,et al. Automatic component matching using forced simulation , 2000, VLSI Design 2000. Wireless and Digital Imaging in the Millennium. Proceedings of 13th International Conference on VLSI Design.
[34] Gregory Diamos,et al. Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.
[35] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.
[36] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.
[37] Henry Hoffmann,et al. Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.
[38] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[39] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[40] Ulf Schlichtmann,et al. Accurately timed transaction level models for virtual prototyping at high abstraction level , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[41] M. Valero,et al. Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.
[42] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[43] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[44] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[45] David Villa,et al. Unified Inter-Communication Architecture for Systems-on-Chip , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).
[46] Lieven Eeckhout,et al. SWAP: Parallelization through Algorithm Substitution , 2012, IEEE Micro.
[47] A. Choudhary,et al. A Library Based Compiler to Execute Matlab Programs on a Heterogeneous Platform , 2007 .
[48] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[49] Benoît Meister,et al. Runnemede: An architecture for Ubiquitous High-Performance Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[50] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.
[51] Nan Jiang,et al. A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[52] Gabriela Nicolescu,et al. Component-based design approach for multicore SoCs , 2002, DAC '02.
[53] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[54] Thomas Gschwind,et al. Composing Distributed Components with the Component Workbench , 2002, SEM.
[55] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[56] Karthik Pattabiraman,et al. Flicker: Saving Refresh-Power in Mobile Devices through Critical Data Partitioning , 2009 .
[57] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[58] Karthikeyan Sankaralingam,et al. Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.
[59] Alberto L. Sangiovanni-Vincentelli,et al. Addressing the system-on-a-chip interconnect woes through communication-based design , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[60] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[61] Surendra Byna,et al. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory , 2010, SPAA '10.