3D-DRAM Performance for Different OpenMP Scheduling Techniques in Multicore Systems
暂无分享,去创建一个
[1] J. M. Bull,et al. Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .
[2] Axel Jantsch,et al. A survey of memory architecture for 3D chip multi-processors , 2014, Microprocess. Microsystems.
[3] Mike Ignatowski,et al. A new perspective on processing-in-memory architecture design , 2013, MSPC '13.
[4] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] David Novo,et al. Position Paper: OpenMP scheduling on ARM big.LITTLE architecture , 2016 .
[6] So-Ra Kim,et al. 8Gb 3D DDR3 DRAM using through-silicon-via technology , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[7] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[8] Alejandro Duran,et al. Is the Schedule Clause Really Necessary in OpenMP? , 2003, WOMPAT.
[9] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[10] Krishna M. Kavi,et al. Exploring the Processing-in-Memory design space , 2017, J. Syst. Archit..
[11] Luca Benini,et al. Design space exploration for 3D-stacked DRAMs , 2011, 2011 Design, Automation & Test in Europe.
[12] Krishna M. Kavi,et al. Dataflow based Near Data Computing Achieves Excellent Energy Efficiency , 2017, HEART.
[13] Nanning Zheng,et al. 3D DRAM Design and Application to 3D Multicore Systems , 2009, IEEE Design & Test of Computers.
[14] Jaejin Lee,et al. High bandwidth memory(HBM) with TSV technique , 2016, 2016 International SoC Design Conference (ISOCC).
[15] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[16] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[17] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] Joonyoung Kim,et al. HBM: Memory solution for bandwidth-hungry processors , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[19] Krishna M. Kavi,et al. Memory organizations for 3D-DRAMs and PCMs in processor memory hierarchy , 2015, J. Syst. Archit..
[20] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[21] Young-Hyun Jun,et al. 8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.
[22] Aamer Jaleel,et al. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Krishna M. Kavi,et al. DVFS Space Exploration in Power Constrained Processing-in-Memory Systems , 2017, ARCS.
[24] Rachata Ausavarungnirun,et al. Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[25] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[26] Franz Franchetti,et al. 3D DRAM based application specific hardware accelerator for SpMV , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[27] Krishna M. Kavi,et al. HBM-Resident Prefetching for Heterogeneous Memory System , 2017, ARCS.
[28] R. A. A. Raof,et al. Performance Analysis of OpenMP Scheduling Type on Embarrassingly Parallel Matrix Multiplication Algorithm , 2017 .
[29] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.