BACH: A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories
暂无分享,去创建一个
Tao Zhang | Cong Xu | Yuan Xie | Jishen Zhao
[1] M. Hosomi,et al. A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..
[2] Xiaoxia Wu,et al. Exploration of 3D stacked L2 cache design for high performance and efficient thermal control , 2009, ISLPED.
[3] N. Gura,et al. UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.
[4] Yuan Xie,et al. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Vijayalakshmi Srinivasan,et al. Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Yiran Chen,et al. A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[7] Kinam Kim,et al. Highly manufacturable high density phase change memory of 64Mb and beyond , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..
[8] Luca Benini,et al. An efficient distributed memory interface for many-core platform with 3D stacked DRAM , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[9] N. Shimomura,et al. Impact of ultra low power and fast write operation of advanced perpendicular MTJ on power reduction for high-performance mobile CPU , 2012, 2012 International Electron Devices Meeting.
[10] Tao Zhang,et al. MorphCache: A Reconfigurable Adaptive Multi-level Cache hierarchy , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[11] Ruhi Sarikaya,et al. Predicting Program Behavior Based On Objective Function Minimization , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.
[12] Engin Ipek,et al. Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.
[13] Stanley F. Chen,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[14] Kinam Kim,et al. Bi-layered RRAM with unlimited endurance and extremely uniform switching , 2011, 2011 Symposium on VLSI Technology - Digest of Technical Papers.
[15] Krisztián Flautner,et al. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.
[16] Ruhi Sarikaya,et al. Runtime workload behavior prediction using statistical metric modeling with application to dynamic power management , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[17] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[18] Young-Hyun Jun,et al. A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 $\times$ 128 I/Os Using TSV Based Stacking , 2011, IEEE Journal of Solid-State Circuits.
[19] Frederick T. Chen,et al. Evidence and solution of over-RESET problem for HfOX based resistive memory with sub-ns switching speed and high endurance , 2010, 2010 International Electron Devices Meeting.
[20] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[21] Cong Xu,et al. Moguls: A model to explore the memory hierarchy for bandwidth improvements , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[22] Yuan Xie,et al. Design space exploration for 3D architectures , 2006, JETC.
[23] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[24] Siddharth Gaba,et al. Nanoscale resistive memory with intrinsic diode characteristics and long endurance , 2010 .
[25] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[26] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[27] Sandhya Dwarkadas,et al. Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[28] Sanjeev Kumar,et al. Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.
[29] Chenjie Yu,et al. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms , 2010, Design Automation Conference.
[30] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[32] Karin Strauss,et al. Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.
[33] M. Facchini,et al. Stackable memory of 3D chip integration for mobile applications , 2008, 2008 IEEE International Electron Devices Meeting.
[34] James R. Goodman,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[35] H. Noguchi,et al. Progress of STT-MRAM technology and the effect on normally-off computing systems , 2012, 2012 International Electron Devices Meeting.
[36] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[37] Kaustav Banerjee,et al. A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[38] Norman P. Jouppi,et al. FREE-p: Protecting non-volatile memory against both hard and soft errors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[39] Hsien-Hsin S. Lee,et al. SAFER: Stuck-At-Fault Error Recovery for Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[40] Luan Tran,et al. 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[41] Martin Burtscher,et al. Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.
[42] Hsien-Hsin S. Lee,et al. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[43] L. Goux,et al. Ultralow sub-500nA operating current high-performance TiN\Al2O3\HfO2\Hf\TiN bipolar RRAM achieved through understanding-based stack-engineering , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[44] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[45] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[46] Shih-Hung Chen,et al. Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..
[47] Young-Hyun Jun,et al. A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking , 2011, 2011 IEEE International Solid-State Circuits Conference.
[48] H. Noguchi,et al. Novel hybrid DRAM/MRAM design for reducing power of high performance mobile CPU , 2012, 2012 International Electron Devices Meeting.
[49] Sally A. McKee,et al. Reflections on the memory wall , 2004, CF '04.
[50] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[51] Onur Mutlu,et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.
[52] Hsien-Hsin S. Lee,et al. Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping , 2010, ISCA.
[53] E. Belhaire,et al. Macro-model of Spin-Transfer Torque based Magnetic Tunnel Junction device for hybrid Magnetic-CMOS design , 2006, 2006 IEEE International Behavioral Modeling and Simulation Workshop.
[54] Yuan Xie,et al. Cost-aware three-dimensional (3D) many-core multiprocessor design , 2010, Design Automation Conference.
[55] K. Saban. Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity , Bandwidth , and Power Efficiency , 2009 .
[56] Guangyu Sun. Moguls: A Model to Explore the Memory Hierarchy for Throughput Computing , 2014 .
[57] L. Goux,et al. Dynamic ‘hour glass’ model for SET and RESET in HfO2 RRAM , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[58] Patrick Dorsey. Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency , 2010 .
[59] T. Mudge,et al. Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[60] Xiaoxia Wu,et al. Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.
[61] Norman P. Jouppi,et al. Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).