Lightweight SIMT core designs for intelligent 3D stacked DRAM
暂无分享,去创建一个
[1] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[2] Gabriel H. Loh. Computer architecture for die stacking , 2012, Proceedings of Technical Program of 2012 VLSI Technology, System and Application.
[3] Reena Panda,et al. Prefetching Techniques for Near-memory Throughput Processors , 2016, ICS.
[4] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[5] Scott A. Mahlke,et al. An architecture framework for transparent instruction set customization in embedded processors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[6] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.
[7] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[8] Russell Tessier,et al. FlexGrip: A soft GPGPU for FPGAs , 2013, 2013 International Conference on Field-Programmable Technology (FPT).
[9] Jing Li,et al. Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search , 2017, FPGA.
[10] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[11] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[12] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[13] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[14] Timothy N. Miller,et al. NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[15] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[16] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[17] John Wawrzynek,et al. Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.
[18] Mahmut T. Kandemir,et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[19] Franz Franchetti,et al. 3D DRAM based application specific hardware accelerator for SpMV , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[20] Mayler G. A. Martins,et al. Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.
[21] Kunle Olukotun,et al. Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).