Understanding and Improving the Latency of DRAM-Based Memory Systems

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic random-access memory), which has been used as the physical substrate for main memory. In stark contrast with capacity and bandwidth, DRAM latency has remained almost constant, reducing by only 1.3x in the same time frame. Therefore, long DRAM latency continues to be a critical performance bottleneck in modern systems. Increasing core counts, and the emergence of increasingly more data-intensive and latency-critical applications further stress the importance of providing low-latency memory access. In this dissertation, we identify three main problems that contribute significantly to long latency of DRAM accesses. To address these problems, we present a series of new techniques. Our new techniques significantly improve both system performance and energy efficiency. We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency. The key conclusion of this dissertation is that augmenting DRAM architecture with simple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency improvement. We hope and believe that the proposed architectural techniques and the detailed experimental data and observations on real commodity DRAM chips presented in this dissertation will enable development of other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems.

[1]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[2]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[3]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[4]  Harold S. Stone,et al.  A Logic-in-Memory Computer , 1970, IEEE Transactions on Computers.

[5]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[6]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[7]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[8]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[9]  Paolo Antognetti,et al.  Semiconductor Device Modeling with Spice , 1988 .

[10]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[11]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12]  James E. Smith,et al.  Performance Of Cached Dram Organizations In Vector Supercomputers , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[13]  Peter M. Kogge,et al.  EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[14]  Charles A. Hart CDRAM in a unified memory architecture , 1994, Proceedings of COMPCON '94.

[15]  Anoop Gupta,et al.  Scheduling and page migration for multiprocessor compute servers , 1994, ASPLOS VI.

[16]  Michel Dubois,et al.  Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[17]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[18]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[19]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.

[20]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[21]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[22]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[23]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[25]  R. Jacob Baker,et al.  CMOS Circuit Design, Layout, and Simulation , 1997 .

[26]  Gershon Kedem,et al.  WCDRAM: A fully associative integrated Cached-DRAM with wide cache lines , 1997 .

[27]  Kai Wang,et al.  Highly accurate data value prediction using hybrid predictors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[28]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[29]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[30]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[31]  Kazuaki Murakami,et al.  Optimizing the DRAM refresh count for merged DRAM/logic LSIs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[32]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[33]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[34]  Hiroyuki Kobayashi,et al.  Fast cycle RAM (FCRAM); a 20-ns random row access, pipe-lined operating DRAM , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[35]  T. Hamamoto,et al.  On the retention time distribution of dynamic random access memory (DRAM) , 1998 .

[36]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[37]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[38]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[39]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[40]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[41]  S. Nassif,et al.  Delay variability: sources, impacts and trends , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[42]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[43]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[44]  Brent Keeth,et al.  DRAM Circuit Design: A Tutorial , 2000 .

[45]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[46]  Zhen Fang,et al.  The Impulse Memory Controller , 2001, IEEE Trans. Computers.

[47]  H. Fujisawa,et al.  A multi-gigabit DRAM technology with 6F/sup 2/ open-bit-line cell distributed over-driven sensing and stacked-flash fuse , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[48]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[49]  Marios C. Papaefthymiou,et al.  Block-based multi-period refresh for energy efficient dynamic memory , 2001, Proceedings 14th Annual IEEE International ASIC/SOC Conference (IEEE Cat. No.01TH8558).

[50]  Dirk Grunwald,et al.  A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.

[51]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[52]  Jose Renau,et al.  Programming the FlexRAM parallel intelligent memory system , 2003, PPoPP '03.

[53]  Onur Mutlu,et al.  Runahead Execution: An Effective Alternative to Large Instruction Windows , 2003, IEEE Micro.

[54]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[55]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[56]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[57]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[58]  D. Ielmini,et al.  Reliability study of phase-change nonvolatile memories , 2004, IEEE Transactions on Device and Materials Reliability.

[59]  M. Igeta,et al.  Comprehensive study of soft errors in advanced CMOS circuits with 90/130 nm technology , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..

[60]  Guido Appenzeller,et al.  Sizing router buffers , 2004, SIGCOMM '04.

[61]  Said Hamdioui,et al.  Effects of bit line coupling on the faulty behavior of DRAMs , 2004, 22nd IEEE VLSI Test Symposium, 2004. Proceedings..

[62]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[63]  Laxmi N. Bhuyan,et al.  Hardware support for bulk data movement in server platforms , 2005, 2005 International Conference on Computer Design.

[64]  Onur Mutlu,et al.  Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[65]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[66]  Onur Mutlu,et al.  Techniques for efficient processing in runahead execution engines , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[67]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[68]  H. Puchner,et al.  Investigation of multi-bit upsets in a 150 nm technology SRAM device , 2005, IEEE Transactions on Nuclear Science.

[69]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[70]  Kinam Kim,et al.  Technology for sub-50nm DRAM and NAND flash manufacturing , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[71]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[72]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[73]  Frank Mueller,et al.  Hardware profile-guided automatic page placement for ccNUMA systems , 2006, PPoPP '06.

[74]  J.D. Cressler,et al.  Multiple-Bit Upset in 130 nm CMOS Technology , 2006, IEEE Transactions on Nuclear Science.

[75]  Onur Mutlu,et al.  Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance , 2006, IEEE Micro.

[76]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[77]  Onur Mutlu,et al.  Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses , 2006, IEEE Transactions on Computers.

[78]  Michael Gschwind Chip multiprocessing and the cell broadband engine , 2006, CF '06.

[79]  Stamatis Vassiliadis,et al.  A hardware cache memcpy accelerator , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[80]  Edward A. Lee The problem with threads , 2006, Computer.

[81]  Kiyoo Itoh,et al.  Vlsi Memory Chip Design , 2006 .

[82]  Calvin Lin,et al.  Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[83]  Frederick A. Ware,et al.  Improving Power and Data Efficiency with Threaded Memory Modules , 2006, 2006 International Conference on Computer Design.

[84]  Said Hamdioui,et al.  Manifestation of Precharge Faults in High Speed DRAM Devices , 2007, 2007 IEEE Design and Diagnostics of Electronic Circuits and Systems.

[85]  Jun Shao,et al.  A Burst Scheduling Access Reordering Mechanism , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[86]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[87]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[88]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[89]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[90]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[91]  David A. Wood,et al.  Interactions Between Compression and Prefetching in Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[92]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[93]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[94]  William J. Dally,et al.  Architectural Support for the Stream Execution Model on General-Purpose Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[95]  Masashi Horiguchi,et al.  Ultra-Low Voltage Nano-Scale Memories , 2007, Series on Integrated Circuits and Systems.

[96]  Kieran McLaughlin,et al.  An RLDRAM II Implementation of a 10Gbps Shared Packet Buffer for Network Processing , 2007, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007).

[97]  Won-Taek Lim,et al.  Effective Management of DRAM Bandwidth in Multicore Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[98]  Onur Mutlu,et al.  Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[99]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[100]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[101]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[102]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[103]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[104]  Onur Mutlu,et al.  Distributed order scheduling and its application to multi-core dram controllers , 2008, PODC '08.

[105]  Zhao Zhang,et al.  Memory Access Scheduling Schemes for Systems with Multi-Core Processors , 2008, 2008 37th International Conference on Parallel Processing.

[106]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[107]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[108]  Tao Li,et al.  Characterizing and mitigating the impact of process variations on phase change based memory systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[109]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[110]  Haoyu Song,et al.  Towards 100G packet processing: Challenges and technologies , 2009, Bell Labs Technical Journal.

[111]  Jung Ho Ahn,et al.  Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs , 2009, IEEE Computer Architecture Letters.

[112]  Onur Mutlu,et al.  Express Cube Topologies for on-Chip Interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[113]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[114]  Onur Mutlu,et al.  Improving memory Bank-Level Parallelism in the presence of prefetching , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[115]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[116]  Lizy Kurian John,et al.  ESKIMO - energy savings using semantic knowledge of inconsequential memory occupancy for DRAM subsystem , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[117]  Yan Solihin,et al.  Architecture Support for Improving Bulk Memory Copying and Initialization Performance , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[118]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[119]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[120]  P. Roche,et al.  Altitude and Underground Real-Time SER Characterization of CMOS 65 nm SRAM , 2008, IEEE Transactions on Nuclear Science.

[121]  Onur Mutlu,et al.  Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[122]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[123]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[124]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[125]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[126]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[127]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[128]  Onur Mutlu,et al.  Phase change memory architecture and the quest for scalability , 2010, Commun. ACM.

[129]  Yan Solihin,et al.  CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[130]  Thomas Vogelsang,et al.  Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[131]  Xin Li,et al.  A Realistic Evaluation of Memory Hardware Errors and Software System Susceptibility , 2010, USENIX Annual Technical Conference.

[132]  Moinuddin K. Qureshi,et al.  Morphable memory system: a robust architecture for exploiting multi-level phase change memories , 2010, ISCA.

[133]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[134]  John Kim,et al.  Approximating age-based arbitration in on-chip networks , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[135]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[136]  Onur Mutlu,et al.  DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .

[137]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[138]  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[139]  David W. Nellans,et al.  Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.

[140]  Onur Mutlu,et al.  Data marshaling for multi-core architectures , 2010, ISCA.

[141]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[142]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[143]  Lizy Kurian John,et al.  Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[144]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[145]  Lizy Kurian John,et al.  The virtual write queue: coordinating DRAM and last-level cache policies , 2010, ISCA.

[146]  Chita R. Das,et al.  Aérgia: exploiting packet latency slack in on-chip networks , 2010, ISCA.

[147]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[148]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[149]  Zheng Guo,et al.  Dynamic SRAM stability characterization in 45nm CMOS , 2010, 2010 Symposium on VLSI Circuits.

[150]  Chris Fallin,et al.  CHIPPER: A low-complexity bufferless deflection router , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[151]  Ken Mai,et al.  FPGA-Based Solid-State Drive Prototyping Platform , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[152]  Onur Mutlu,et al.  Prefetch-Aware Memory Controllers , 2011, IEEE Transactions on Computers.

[153]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[154]  Onur Mutlu,et al.  Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[155]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[156]  S. Phadke,et al.  MLP aware heterogeneous memory system , 2011, 2011 Design, Automation & Test in Europe.

[157]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[158]  Doris Schmitt-Landsiedel,et al.  DRAM Yield Analysis and Optimization by a Statistical Design Approach , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[159]  Bruce F. Cockburn,et al.  Design and Characterization of a Multilevel DRAM , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[160]  Chris Fallin,et al.  Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[161]  Ricardo Bianchini,et al.  Page placement in hybrid memory systems , 2011, ICS '11.

[162]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[163]  Marina Thottan,et al.  Adapting router buffers for energy efficiency , 2011, CoNEXT '11.

[164]  Onur Mutlu,et al.  Prefetch-aware shared-resource management for multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[165]  Onur Mutlu,et al.  Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[166]  Kevin Kai-Wei Chang,et al.  Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[167]  Jun Yang,et al.  Improving write operations in MLC phase change memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[168]  Christoforos E. Kozyrakis,et al.  Improving System Energy Efficiency with Memory Rank Subsetting , 2012, TACO.

[169]  Bianca Schroeder,et al.  Temperature management in data centers: why some (might) like it hot , 2012, SIGMETRICS '12.

[170]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[171]  Seung-Moon Yoo,et al.  FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[172]  Mahmut T. Kandemir,et al.  Addressing End-to-End Memory Access Latency in NoC-Based Multicores , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[173]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[174]  Norman P. Jouppi,et al.  Staged Reads: Mitigating the impact of DRAM writes on DRAM reads , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[175]  Zhen Fang,et al.  Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[176]  Onur Mutlu,et al.  Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[177]  Zhe Zhang,et al.  Memory module-level testing and error behaviors for phase change memory , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[178]  Kevin Kai-Wei Chang,et al.  HAT: Heterogeneous Adaptive Throttling for On-Chip Networks , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[179]  Kevin Kai-Wei Chang,et al.  MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[180]  Marvin Onabajo,et al.  Analog Circuit Design for Process Variation-Resilient Systems-on-a-Chip , 2012 .

[181]  M. Inaba,et al.  High Performance Memory Access Scheduling Using Compute-Phase Prediction and Writeback-Refresh Overlap , 2012 .

[182]  Lei Liu,et al.  A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[183]  Onur Mutlu,et al.  The evicted-address filter: A unified mechanism to address both cache pollution and thrashing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[184]  Thomas F. Wenisch,et al.  MultiScale: memory system DVFS with multiple memory controllers , 2012, ISLPED '12.

[185]  Rachata Ausavarungnirun,et al.  Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[186]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[187]  Bianca Schroeder,et al.  Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design , 2012, ASPLOS XVII.

[188]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[189]  Vilas Sridharan,et al.  A study of DRAM failures in the field , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[190]  Chia-Lin Yang,et al.  SECRET: Selective error correction for refresh energy reduction in DRAMs , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[191]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[192]  Mahmut T. Kandemir,et al.  Orchestrated scheduling and prefetching for GPGPUs , 2013, ISCA.

[193]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[194]  José González,et al.  Thread Row Buffers: Improving Memory Performance Isolation and Throughput in Multiprogrammed Environments , 2013, IEEE Transactions on Computers.

[195]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[196]  Amin Ansari,et al.  Refrint: Intelligent refresh to minimize power in on-chip multiprocessor cache hierarchies , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[197]  P. Zabinski,et al.  Coming Challenges with Terabit-per-Second Data Communication , 2013, IEEE Circuits and Systems Magazine.

[198]  Mahmut T. Kandemir,et al.  OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.

[199]  Reetuparna Das,et al.  Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[200]  Onur Mutlu,et al.  A Case for Effic ient Hardware/Soft ware Cooperative Management of Storage and Memory , 2013 .

[201]  Ming Liu,et al.  An Intelligent RAM with Serial I/Os , 2013, IEEE Micro.

[202]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[203]  José F. Martínez,et al.  Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems , 2013, ISCA.

[204]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[205]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[206]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[207]  O Seongil,et al.  Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.

[208]  Onur Mutlu,et al.  Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[209]  José F. Martínez,et al.  Improving memory scheduling via processor-side load criticality information , 2013, ISCA.

[210]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[211]  Moinuddin K. Qureshi,et al.  A case for Refresh Pausing in DRAM memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[212]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[213]  Cody Cutler,et al.  Optimizing RAM-latency dominated applications , 2013, APSys.

[214]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[215]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[216]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[217]  Benjamin Barras,et al.  SPICE – Simulation Program with Integrated Circuit Emphasis , 2013 .

[218]  Onur Mutlu,et al.  ERRoR ANAlysIs AND RETENTIoN-AwARE ERRoR MANAgEMENT FoR NAND FlAsh MEMoRy , 2013 .

[219]  Bruce Jacob,et al.  Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[220]  Rajeev Balasubramonian,et al.  Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[221]  Onur Mutlu,et al.  Rollback-free value prediction with approximate loads , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[222]  Jie Liu,et al.  Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[223]  Hyeran Jeon,et al.  Graph processing on GPUs: Where are the bottlenecks? , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[224]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[225]  Onur Mutlu,et al.  Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks , 2014, ACM Trans. Archit. Code Optim..

[226]  Moinuddin K. Qureshi,et al.  Citadel: Efficiently Protecting Stacked Memory from Large Granularity Failures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[227]  Reetuparna Das,et al.  Design and Evaluation of Hierarchical Rings with Deflection Routing , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[228]  Avi Mendelson,et al.  Deep-dive analysis of the data analytics workload in CloudSuite , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[229]  Lei Liu,et al.  Going vertical in memory management: Handling multiplicity by multi-policy , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[230]  Onur Mutlu,et al.  The Dirty-Block Index , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[231]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[232]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[233]  Wongyu Shin,et al.  NUAT: A non-uniform access time memory controller , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[234]  Norbert Wehn,et al.  Exploiting expendable process-margins in DRAMs for run-time performance optimization , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[235]  O Seongil,et al.  Row-buffer decoupling: A case for low-latency DRAM microarchitecture , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[236]  Rajeev Balasubramonian,et al.  MemZip: Exploring unconventional benefits from memory compression , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[237]  Thomas P. Parnell,et al.  Modelling of the threshold voltage distributions of sub-20nm NAND flash memory , 2014, 2014 IEEE Global Communications Conference.

[238]  Amin Ansari,et al.  Mosaic: Exploiting the spatial locality of process variation to reduce refresh energy in on-chip eDRAM modules , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[239]  Onur Mutlu,et al.  The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[240]  Björn Andersson,et al.  Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[241]  Onur Mutlu,et al.  FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[242]  Rami G. Melhem,et al.  Refresh Now and Then , 2014, IEEE Transactions on Computers.

[243]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[244]  Onur Mutlu,et al.  Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories , 2014, ACM Trans. Archit. Code Optim..

[245]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[246]  Mike Ignatowski,et al.  TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[247]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[248]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[249]  Madhu Mutyam,et al.  EFGR: An Enhanced Fine Granularity Refresh Feature for High-Performance DDR4 DRAM Devices , 2014, ACM Trans. Archit. Code Optim..

[250]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[251]  Sudhakar Yalamanchili,et al.  Harmonia: Balancing compute and memory power in high-performance GPUs , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[252]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[253]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[254]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[255]  Moinuddin K. Qureshi,et al.  Reducing read latency of phase change memory via early read and Turbo Read , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[256]  Mahmut T. Kandemir,et al.  A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[257]  Yoonho Park,et al.  Data access optimization in a processing-in-memory system , 2015, Conf. Computing Frontiers.

[258]  Mahmut T. Kandemir,et al.  Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[259]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[260]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[261]  Onur Mutlu,et al.  Simultaneous Multi Layer Access: A High Bandwidth and Low Cost 3D-Stacked Memory Interface , 2015, ArXiv.

[262]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[263]  Thomas Willhalm,et al.  Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads , 2015, 2015 IEEE International Symposium on Workload Characterization.

[264]  Qiang Wu,et al.  A Large-Scale Study of Flash Memory Failures in the Field , 2015, SIGMETRICS 2015.

[265]  Stephen W. Keckler,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.

[266]  Christoforos E. Kozyrakis,et al.  Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[267]  Franz Franchetti,et al.  Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[268]  Vladimir Vlassov,et al.  Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[269]  Qiang Wu,et al.  Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[270]  Rizwana Begum,et al.  Energy-Performance Trade-offs on Energy-Constrained Devices with Multi-component DVFS , 2015, 2015 IEEE International Symposium on Workload Characterization.

[271]  Chia-Lin Yang,et al.  Improving DRAM latency with dynamic asymmetric subarray , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[272]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[273]  Magnus Jahre,et al.  Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[274]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[275]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[276]  Stratos Idreos,et al.  JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[277]  Lavanya Subramanian,et al.  Providing High and Controllable Performance in Multicore Systems Through Shared Resource Management , 2015, ArXiv.

[278]  Onur Mutlu,et al.  Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[279]  Onur Mutlu,et al.  Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.

[280]  John Shalf,et al.  Memory Errors in Modern Systems: The Good, The Bad, and The Ugly , 2015, ASPLOS.

[281]  Jung Ho Ahn,et al.  NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[282]  Bruce Jacob,et al.  Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[283]  Norbert Wehn,et al.  A new bank sensitive DRAMPower model for efficient design space exploration , 2016, 2016 26th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[284]  Onur Mutlu,et al.  RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads , 2016, ACM Trans. Archit. Code Optim..

[285]  Mahmut T. Kandemir,et al.  Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[286]  Masha Sosonkina,et al.  Joint frequency scaling of processor and DRAM , 2016, The Journal of Supercomputing.

[287]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[288]  Vladimir Vlassov,et al.  Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[289]  Onur Mutlu,et al.  ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[290]  Onur Mutlu,et al.  BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.

[291]  N. Wehn,et al.  Reverse Engineering of DRAMs: Row Hammer with Crosshair , 2016, MEMSYS.

[292]  Björn Andersson,et al.  Bounding and reducing memory interference in COTS-based multi-core systems , 2016, Real-Time Systems.

[293]  Onur Mutlu,et al.  Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[294]  Onur Mutlu,et al.  Accelerating Dependent Cache Misses with an Enhanced Memory Controller , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[295]  Donghyuk Lee,et al.  Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity , 2016, ArXiv.

[296]  Arif Merchant,et al.  Flash Reliability in Production: The Expected and the Unexpected , 2016, FAST.

[297]  Fang Wang,et al.  MaxPB: Accelerating PCM Write by Maximizing the Power Budget Utilization , 2016, ACM Trans. Archit. Code Optim..

[298]  SPCM: The Striped Phase Change Memory , 2016, ACM Trans. Archit. Code Optim..

[299]  Kevin Kai-Wei Chang,et al.  DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators , 2016, ACM Trans. Archit. Code Optim..

[300]  Onur Mutlu,et al.  Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM , 2016, ArXiv.

[301]  Onur Mutlu,et al.  Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[302]  Vivek Seshadri,et al.  Simple DRAM and Virtual Memory Abstractions to Enable Highly Efficient Memory Systems , 2016, ArXiv.

[303]  Jie Liu,et al.  SSD Failures in Datacenters: What? When? and Why? , 2016, SYSTOR.

[304]  Mingyu Gao,et al.  HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[305]  Aditya Agrawal,et al.  CLARA: Circular Linked-List Auto and Self Refresh Architecture , 2016, MEMSYS.

[306]  Vijay Janapa Reddi,et al.  Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[307]  Mahmut T. Kandemir,et al.  Exploiting Core Criticality for Enhanced GPU Performance , 2016, SIGMETRICS.

[308]  O. Mutlu,et al.  Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory , 2016, IEEE Journal on Selected Areas in Communications.

[309]  Onur Mutlu,et al.  Continuous runahead: Transparent hardware acceleration for memory intensive workloads , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[310]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[311]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[312]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[313]  Onur Mutlu,et al.  SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[314]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[315]  Rachata Ausavarungnirun,et al.  Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms , 2017, SIGMETRICS.

[316]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[317]  Gennady Pekhimenko,et al.  Design-Induced Latency Variation in Modern DRAM Chips , 2016, Proc. ACM Meas. Anal. Comput. Syst..

[318]  Norbert Wehn,et al.  A Bank-Wise DRAM Power Model for System Simulations , 2017, RAPIDO.

[319]  Srinivas Devadas,et al.  Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[320]  Onur Mutlu,et al.  Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation , 2017, ICS.

[321]  Onur Mutlu,et al.  Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[322]  Onur Mutlu,et al.  LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory , 2017, IEEE Computer Architecture Letters.

[323]  Yixin Luo,et al.  Improving the reliability of chip-off forensic analysis of NAND flash memory devices , 2017, Digit. Investig..

[324]  Onur Mutlu,et al.  Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[325]  Yoongu Kim,et al.  Architectural Techniques to Enhance DRAM Scaling , 2018 .