Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators
暂无分享,去创建一个
Kevin Skadron | Mircea R. Stan | Patricia Gonzalez-Guerrero | Shuangchen Li | Yuan Xie | Elaheh Sadredini | Sean Eilert | Ameen Akel | Marzieh Lenjani | K. Skadron | Ameen Akel | Shuangchen Li | Yuan Xie | Elaheh Sadredini | Marzieh Lenjani | M. Stan | S. Eilert | Patricia Gonzalez-Guerrero
[1] Scott A. Mahlke,et al. In-Memory Data Parallel Processor , 2018, ASPLOS.
[2] Mark Oskin,et al. Active Page Architectures for Media Processing , 1999 .
[3] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[6] Sudhakar Yalamanchili,et al. Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[7] Tomofumi Yuki,et al. Sparse computation data dependence simplification for efficient compiler-generated inspectors , 2019, PLDI.
[8] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[9] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[10] Mahmoud Reza Hashemi,et al. Tree-based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similarities , 2014, IET Comput. Digit. Tech..
[11] Christoforos E. Kozyrakis,et al. Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.
[12] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[13] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[14] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[15] Jung Ho Ahn,et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[16] Jun Yang,et al. DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[17] Neal Cardwell,et al. Evaluation of Existing Architectures in IRAM Systems , 1998 .
[18] Babak Falsafi,et al. The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[19] Kevin Skadron,et al. eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing , 2019, MICRO.
[20] Tao Zhang,et al. Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[21] Frederic T. Chong,et al. Reducing cost and tolerating defects in page-based intelligent memory , 2000, Proceedings 2000 International Conference on Computer Design.
[22] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[23] Gabriel H. Loh,et al. Challenges of High-Capacity DRAM Stacks and Potential Directions , 2018, MCHPC@SC.
[24] Kesheng Wu,et al. FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .
[25] Onur Mutlu,et al. Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Tom W. Chen,et al. Assessing merged DRAM/Logic technology , 1999, Integr..
[27] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[28] Engin Ipek,et al. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).
[29] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[30] Onur Mutlu,et al. Simultaneous Multi-Layer Access , 2016, ACM Trans. Archit. Code Optim..
[31] Onur Mutlu,et al. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[32] Kevin Skadron,et al. FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators , 2020, ASPLOS.
[33] Yuan Xie,et al. SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Onur Mutlu,et al. Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.
[35] Kevin Skadron,et al. Impala: Algorithm/Architecture Co-Design for In-Memory Multi-Stride Pattern Matching , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[36] Yuan Xie,et al. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Noah Treuhaft,et al. Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.
[38] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[39] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[40] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[41] Onur Mutlu,et al. Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[42] Mircea R. Stan,et al. An Overflow-free Quantized Memory Hierarchy in General-purpose Processors , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[43] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[45] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[46] Tajana Simunic,et al. FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).