Specialized Macro-Instructions for Von-Neumann Accelerators
暂无分享,去创建一个
[1] Yi Pan,et al. PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers , 2009, SIGCOMM '09.
[2] Karthikeyan Sankaralingam,et al. A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches , 2015, IEEE Computer Architecture Letters.
[3] Karthikeyan Sankaralingam,et al. LEAP: Latency- energy- and area-optimized lookup pipeline , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).
[4] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Scott A. Mahlke,et al. Polymorphic Pipeline Array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] James E. Smith,et al. The microarchitecture of superscalar processors , 1995, Proc. IEEE.
[7] Scott A. Mahlke,et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[8] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[9] Scott A. Mahlke,et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[10] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[11] Bill Dally. Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, IPDPS.
[12] Lei Zhang,et al. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications , 2014, Journal of Computer Science and Technology.
[13] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[14] Karthikeyan Sankaralingam,et al. Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[15] Amin Ansari,et al. Illusionist: Transforming lightweight cores into aggressive cores on demand , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[16] Karthikeyan Sankaralingam,et al. Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[17] Mikko H. Lipasti,et al. An approach for implementing efficient superscalar CISC processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[18] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[19] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[20] Steven Swanson,et al. QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Simha Sethumadhavan,et al. Distributed Microarchitectural Protocols in the TRIPS Prototype Processor , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[22] Scott A. Mahlke,et al. Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.
[24] Karthikeyan Sankaralingam,et al. Design, integration and implementation of the DySER hardware accelerator into OpenSPARC , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[25] Scott A. Mahlke,et al. Trace based phase prediction for tightly-coupled heterogeneous cores , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Karthikeyan Sankaralingam,et al. Exploring the potential of heterogeneous Von Neumann/dataflow execution models , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[27] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[28] Apala Guha,et al. Chainsaw: Von-neumann accelerators to leverage fused instruction chains , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] D. R. Fulkerson. Note on Dilworth’s decomposition theorem for partially ordered sets , 1956 .
[30] Ali Saidi,et al. The Reconfigurable Streaming Vector Processor (RSVP , 2003 .
[31] Rudy Lauwereins,et al. Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.
[32] William J. Dally,et al. A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] J. Sanchez,et al. Flexible compiler-managed L0 buffers for clustered VLIW processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[34] Mark Horowitz,et al. Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.
[35] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[36] Mikko H. Lipasti,et al. Revolver: Processor architecture for power efficient loop execution , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[37] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[38] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[39] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[40] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[41] Ho-Seop Kim,et al. An instruction set and microarchitecture for instruction level distributed processing , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[42] Lieven Eeckhout,et al. Automatic design of domain-specific instructions for low-power processors , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[43] Michael Taylor. A landscape of the new dark silicon design regime , 2013 .
[44] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[45] Amin Ansari,et al. Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[46] James D. Warnock,et al. Cell processor low-power design methodology , 2005, IEEE Micro.
[47] Karthikeyan Sankaralingam,et al. Analyzing Behavior Specialized Acceleration , 2016, ASPLOS.
[48] David A. Patterson,et al. The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V , 2016, ArXiv.
[49] David Black-Schaffer,et al. Efficient Embedded Computing , 2008, Computer.
[50] Scott A. Mahlke,et al. DynaMOS: Dynamic schedule migration for heterogeneous cores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[51] Ricardo E. Gonzalez,et al. Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.