Compute Caches
暂无分享,去创建一个
David Blaauw | Reetuparna Das | Satish Narayanasamy | Arun Subramaniyan | Shaizeen Aga | Supreet Jeloka | Arun K. Subramaniyan | R. Das | D. Blaauw | Shaizeen Aga | S. Narayanasamy | Supreet Jeloka
[1] Ran Ginosar,et al. Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator , 2013, IEEE Transactions on Computers.
[2] Hyesoon Kim,et al. BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[3] K. Pagiamtzis,et al. Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.
[4] Naresh R. Shanbhag,et al. Energy-efficient and high throughput sparse distributed memory architecture , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).
[5] Christoforos E. Kozyrakis,et al. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[6] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[8] Oded Lempel,et al. 2nd Generation Intel® Core Processor Family: Intel® Core i7, i5 and i3 , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[9] Bill Dally. Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, IPDPS.
[10] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Onur Mutlu,et al. Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.
[12] Mark D. Hill,et al. Weak ordering—a new definition , 1998, ISCA '98.
[13] Meng-Fan Chang,et al. A Large $\sigma $V$_{\rm TH}$/VDD Tolerant Zigzag 8T SRAM With Area-Efficient Decoupled Differential Sensing and Fast Write-Back Scheme , 2011, IEEE Journal of Solid-State Circuits.
[14] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[15] Naresh R. Shanbhag,et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[17] S. Rixner,et al. Optimizing Kernel Block Memory Operations , 2006 .
[18] David Blaauw,et al. A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.
[19] Xi Yang,et al. Why nothing matters: the impact of zeroing , 2011, OOPSLA '11.
[20] Gu-Yeon Wei,et al. Profiling a Warehouse-Scale Computer , 2016, IEEE Micro.
[21] Per Stenström,et al. A novel approach to cache block reuse predictions , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[22] David R. Kaeli,et al. Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Engin Ipek,et al. A resistive TCAM accelerator for data-intensive computing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[26] Stephan Wong,et al. Cache-Based Memory Copy Hardware Accelerator for Multicore Systems , 2010, IEEE Transactions on Computers.
[27] Lieven Eeckhout,et al. Cooperative cache scrubbing , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).