Pushing the limits of accelerator efficiency while retaining programmability
暂无分享,去创建一个
[1] Fadi J. Kurdahi,et al. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.
[2] Kunle Olukotun,et al. Hardware system synthesis from Domain-Specific Languages , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[3] Christoph Hagleitner,et al. Designing a Programmable Wire-Speed Regular-Expression Matching Accelerator , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Paolo Faraboschi,et al. Custom-fit processors: letting applications define architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[5] Jason Cong,et al. CHARM: a composable heterogeneous accelerator-rich microprocessor , 2012, ISLPED '12.
[6] Gu-Yeon Wei,et al. Shrink-Fit: A Framework for Flexible Accelerator Sizing , 2013, IEEE Computer Architecture Letters.
[7] Charlie Johnson,et al. IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.
[8] Amin Ansari,et al. Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[10] Yale N. Patt,et al. MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[12] Ran Ginosar,et al. Generalized MultiAmdahl: Optimization of Heterogeneous Multi-Accelerator SoC , 2014, IEEE Computer Architecture Letters.
[13] Karthikeyan Sankaralingam,et al. Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[14] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[15] Ron K. Cytron,et al. A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[16] Hsien-Hsin S. Lee,et al. Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.
[17] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[18] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[19] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[20] Gu-Yeon Wei,et al. The accelerator store: A shared memory framework for accelerator-based systems , 2012, TACO.
[21] Robert P. Colwell,et al. The chip design game at the end of Moore's law , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[22] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[23] David A. Wood,et al. WiDGET: Wisconsin decoupled grid execution tiles , 2010, ISCA.
[24] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[25] Kunle Olukotun,et al. Implementing Domain-Specific Languages for Heterogeneous Parallel Computing , 2011, IEEE Micro.
[26] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[27] Jason Cong,et al. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[28] Rachel Courtland. The end of the shrink , 2013, IEEE Spectrum.
[29] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[30] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[31] Scott A. Mahlke,et al. Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[32] Jonathan S. Turner,et al. Packet classification using extended TCAMs , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..
[33] Kunle Olukotun,et al. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.
[34] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[35] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[36] Joo-Young Kim,et al. A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.
[37] Joseph Yiu,et al. The definitive guide to the ARM Cortex-M3 , 2007 .
[38] Yoav Etsion,et al. Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[39] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[40] Karthikeyan Sankaralingam,et al. A general constraint-centric scheduling framework for spatial architectures , 2013, PLDI.
[41] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.
[42] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).