暂无分享,去创建一个
Onur Mutlu | Brandon Lucia | Hongzhong Zheng | Saugata Ghose | Hasan Hassan | Minesh Patel | Krishna T. Malladi | Kevin Hsieh | Amirali Boroumand | Nastaran Hajinazar | Amirali Boroumand | O. Mutlu | Kevin Hsieh | Nastaran Hajinazar | Minesh Patel | Saugata Ghose | Hasan Hassan | Hongzhong Zheng | Brandon Lucia
[1] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[2] Josep Torrellas,et al. BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.
[3] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.
[4] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[5] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[6] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Sarita V. Adve,et al. Efficient GPU synchronization without scopes: Saying no to complex consistency models , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Yafei Dai,et al. Seraph: an efficient, low-cost system for concurrent graph processing , 2014, HPDC '14.
[9] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[10] Dong Ping Zhang,et al. Scaling Deep Learning on Multiple In-Memory Processors , 2015 .
[11] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[13] Tze Meng Low,et al. 3 D-Stacked Memory-Side Acceleration : Accelerator and System Design , 2014 .
[14] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[15] Babak Falsafi,et al. Sort vs. Hash Join Revisited for Near-Memory Execution , 2015 .
[16] Onur Mutlu,et al. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[17] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[18] Manos Athanassoulis,et al. Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.
[19] Harold S. Stone,et al. A Logic-in-Memory Computer , 1970, IEEE Transactions on Computers.
[20] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[21] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[22] L. Castedo,et al. SAP HANA , 2014 .
[23] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[24] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[25] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[26] Onur Mutlu,et al. Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.
[27] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[28] Gustavo Alonso,et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[29] David A. Wood,et al. Lazy release consistency for GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Brandon Lucia,et al. DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.
[31] Daniel Sánchez,et al. Implementing Signatures for Transactional Memory , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[32] Onur Mutlu,et al. The Dirty-Block Index , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[33] Seung-Moon Yoo,et al. FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[34] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[35] Josep Torrellas,et al. Automatically mapping code on an intelligent memory architecture , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[36] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[37] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[38] Onur Mutlu,et al. Simultaneous Multi-Layer Access , 2016, ACM Trans. Archit. Code Optim..
[39] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[40] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[41] Snehasish Kumar,et al. Fusion: Design tradeoffs in coherent cache hierarchies for accelerators , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[42] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[43] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] M. Oskin,et al. Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[45] Michael Stonebraker,et al. The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..
[46] David A. Wood,et al. Synchronization Using Remote-Scope Promotion , 2015, ASPLOS.