Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation
暂无分享,去创建一个
Karthikeyan Sankaralingam | Preeti Agarwal | Venkatraman Govindaraju | Chen-Han Ho | Zachary Marzec | Tony Nowatzki | Ryan Cofell | Chris Frericks | Ranjini Nagaraju | K. Sankaralingam | Tony Nowatzki | C. Ho | Venkatraman Govindaraju | Zachary Marzec | Ryan Cofell | Chris Frericks | R. Nagaraju | Preeti Agarwal | Karthikeyan Sankaralingam
[1] Chen-Han Ho. Mechanisms Towards Energy-Efficient Dynamic Hardware Specialization , 2014 .
[2] John Wawrzynek,et al. The Garp Architecture and C Compiler , 2000, Computer.
[3] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[4] Karthikeyan Sankaralingam,et al. LEAP: Latency- energy- and area-optimized lookup pipeline , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).
[5] J. M. Codina,et al. SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion , 2011, CF '11.
[6] Amin Ansari,et al. Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[8] Seth Copen Goldstein,et al. Virtualization on the Tartan Reconfigurable Architecture , 2007, 2007 International Conference on Field Programmable Logic and Applications.
[9] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[10] Carl Ebeling,et al. RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.
[11] Scott A. Mahlke,et al. VEAL: Virtualized Execution Accelerator for Loops , 2008, 2008 International Symposium on Computer Architecture.
[12] Athanasios Kakarountas,et al. Efficient High-Performance ASIC Implementation of JPEG-LS Encoder , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[13] Karthikeyan Sankaralingam,et al. Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.
[14] Andreas Moshovos,et al. CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[15] Venkatraman Govindaraju,et al. Prototyping the DySER specialization architecture with OpenSPARC , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).
[16] Gu-Yeon Wei,et al. HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[17] Scott A. Mahlke,et al. Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[18] Max B Aron. The single-chip cloud computer , 2010 .
[19] David J. Kuck,et al. The Burroughs Scientific Processor (BSP) , 1982, IEEE Transactions on Computers.
[20] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Jeffrey R. Diamond,et al. An evaluation of the TRIPS computer system , 2009, ASPLOS.
[22] Fadi J. Kurdahi,et al. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.
[23] Lizy Kurian John,et al. Scaling to the end of silicon with EDGE architectures , 2004, Computer.
[24] Sorin Lerner,et al. Automated soundness proofs for dataflow analyses and transformations via local rules , 2005, POPL '05.
[25] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[26] Karthikeyan Sankaralingam,et al. Mechanisms for Parallelism Specialization for the DySER Architecture , 2012 .
[27] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.
[28] Michael Stepp,et al. Generating compiler optimizations from proofs , 2010, POPL '10.
[29] Karthikeyan Sankaralingam,et al. Design, integration and implementation of the DySER hardware accelerator into OpenSPARC , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[30] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[31] Seth Copen Goldstein,et al. PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.
[32] Pradeep Dubey,et al. Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.
[33] Al Davis,et al. A loop accelerator for low power embedded VLIW processors , 2004, CODES+ISSS '04.
[34] Michael A. Schuette,et al. The Reconfigurable Streaming Vector Processor (RSVPTM) , 2003, MICRO.
[35] John Wawrzynek,et al. Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[36] Mary Lou Soffa,et al. An approach for exploring code improving transformations , 1997, TOPL.
[37] Todd M. Austin,et al. CryptoManiac: a fast flexible architecture for secure communication , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[38] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[39] Hyunseok Lee,et al. SODA: A Low-power Architecture For Software Radio , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[40] Steven W. K. Tjiang,et al. Sharlit—a tool for building optimizers , 1992, PLDI '92.
[41] Scott A. Mahlke,et al. An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[42] Venkatraman Govindaraju,et al. Energy efficient computing through compiler assisted dynamic specialization , 2014 .
[43] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[44] James E. Smith,et al. Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[45] Seth Copen Goldstein,et al. Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.
[46] Ricardo E. Gonzalez,et al. Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.
[47] Georgi Gaydadjiev,et al. SAMS multi-layout memory: providing multiple views of data to boost SIMD performance , 2010, ICS '10.
[48] Mark Horowitz,et al. Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.
[49] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[50] Karthikeyan Sankaralingam,et al. Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[51] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[52] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[53] Eric Rotenberg,et al. FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[54] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[55] Pradeep Dubey,et al. Can traditional programming bridge the Ninja performance gap for parallel computing applications , 2012, ISCA 2012.
[56] William J. Dally,et al. Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[57] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[58] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[59] Scott A. Mahlke,et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[60] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[61] M. Oskin,et al. The Microarchitecture of a Pipelined WaveScalar Processor : An RTL-based Study , 2005 .
[62] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).