Architecting Memory Systems for Emerging Technologies
暂无分享,去创建一个
R. Dreslinski | Cao Gao | Antony Gutierrez | Q. Zheng | Nilmini Aberaytne | Yajing Chen | Jonathan Beaumont | Dong-hyeon Park
[1] Masashi Horiguchi,et al. A flexible redundancy technique for high-density DRAMs , 1991 .
[2] John von Neumann,et al. First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.
[3] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[4] Soo-In Cho,et al. A 32-bank 1 Gb DRAM with 1 GB/s bandwidth , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.
[5] Martin L. Kersten,et al. Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.
[6] H. Ikeda,et al. High-speed DRAM architecture development , 1999 .
[7] Trevor N. Mudge,et al. A performance comparison of contemporary DRAM architectures , 1999, ISCA.
[8] Nihar R. Mahapatra,et al. The processor-memory bottleneck: problems and solutions , 1999, CROS.
[9] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[10] Michael Mitzenmacher,et al. The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..
[11] Kyeong-Sik Min,et al. A fast pump-down V/sub BB/ generator for sub-1.5-V DRAMs , 2001 .
[12] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[13] Koen De Bosschere,et al. XOR-based hash functions , 2005, IEEE Transactions on Computers.
[14] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[15] Naga K. Govindaraju,et al. GPGPU: general-purpose computation on graphics hardware , 2006, SC.
[16] Bruce Jacob,et al. Memory Systems: Cache, DRAM, Disk , 2007 .
[17] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[18] M. McDaniel,et al. Prospective Memory: An Overview and Synthesis of an Emerging Field , 2007 .
[19] Stephen C. Graves,et al. Little's Law , 2008 .
[20] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[21] 박기태,et al. Semiconductor memory device with three-dimensional array structure and repair method thereof , 2008 .
[22] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[23] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[24] John Y. Chen,et al. GPU technology trends and future requirements , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[25] Luan Tran,et al. 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[26] Arijit Raychowdhury,et al. Design space and scalability exploration of 1T-1STT MTJ memory arrays in the presence of variability and disturbances , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).
[27] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.
[28] Young-Hyun Jun,et al. 1.2V 1.6Gb/s 56nm 6F2 4Gb DDR3 SDRAM with hybrid-I/O sense amplifier and segmented sub-array architecture , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[29] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[30] Tor M. Aamodt,et al. Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] J. Nowak,et al. Switching distributions and write reliability of perpendicular spin torque MRAM , 2010, 2010 International Electron Devices Meeting.
[32] Luca Benini,et al. An efficient distributed memory interface for many-core platform with 3D stacked DRAM , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[33] A. Driskill-Smith,et al. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application , 2010, 2010 International Electron Devices Meeting.
[34] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[35] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[36] Norman P. Jouppi,et al. Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.
[37] Bruce Jacob,et al. Fine-Grained Activation for Power Reduction in DRAM , 2010, IEEE Micro.
[38] Yoshihiro Ueda,et al. A 64Mb MRAM with clamped-reference and adequate-reference schemes , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[39] Masashi Horiguchi,et al. Nanoscale Memory Repair , 2011, Integrated Circuits and Systems.
[40] Ki-Whan Song,et al. A 58nm 1.8V 1Gb PRAM with 6.4MB/s program BW , 2011, 2011 IEEE International Solid-State Circuits Conference.
[41] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[42] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[43] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[44] Jung Ho Ahn,et al. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[45] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[46] Richard Veras,et al. RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[47] Jong-Ho Kang,et al. A 1.2V 23nm 6F2 4Gb DDR3 SDRAM with local-bitline sense amplifier, hybrid LIO sense amplifier and dummy-less array architecture , 2012, 2012 IEEE International Solid-State Circuits Conference.
[48] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[49] N. Shimomura,et al. Impact of ultra low power and fast write operation of advanced perpendicular MTJ on power reduction for high-performance mobile CPU , 2012, 2012 International Electron Devices Meeting.
[50] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[51] 藤田 忍,et al. Magnetic random access memory and a memory system , 2012 .
[52] Qi Wang,et al. A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth , 2012, 2012 IEEE International Solid-State Circuits Conference.
[53] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[54] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[55] Meng-Fan Chang,et al. An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory , 2013, IEEE Journal of Solid-State Circuits.
[56] Meng-Fan Chang,et al. A High-Speed 7.2-ns Read-Write Random Access 4-Mb Embedded Resistive RAM (ReRAM) Macro Using Process-Variation-Tolerant Current-Mode Read Schemes , 2013, IEEE Journal of Solid-State Circuits.
[57] Jan Lindström,et al. IBM solidDB: In-Memory Database Optimized for Extreme Speed and Availability , 2013, IEEE Data Eng. Bull..
[58] Tony Tung,et al. Scaling Memcache at Facebook , 2013, NSDI.
[59] Tao Li,et al. Exploring high-performance and energy proportional interface for phase change memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[60] Onur Mutlu,et al. Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.
[61] J. Slaughter,et al. A Fully Functional 64 Mb DDR3 ST-MRAM Built on 90 nm CMOS Technology , 2013, IEEE Transactions on Magnetics.
[62] Mahmut T. Kandemir,et al. Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[63] Doris Schmitt-Landsiedel,et al. Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.
[64] Seong-Ook Jung,et al. An Offset-Canceling Triple-Stage Sensing Circuit for Deep Submicrometer STT-RAM , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[65] Rajeev Balasubramonian,et al. Managing DRAM Latency Divergence in Irregular GPGPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[66] Laura Carrington,et al. Evaluation of emerging memory technologies for HPC, data intensive applications , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[67] Dilpreet Singh,et al. A survey on platforms for big data analytics , 2014, Journal of Big Data.
[68] Yuan Xie,et al. Enabling high-performance LPDDRx-compatible MRAM , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).
[69] S. Le,et al. Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications (invited) , 2014 .
[70] Tao Zhang,et al. Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[71] Seung H. Kang,et al. Systematic optimization of 1 Gbit perpendicular magnetic tunnel junction arrays for 28 nm embedded STT-MRAM and beyond , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).
[72] Seong-Ook Jung,et al. Latch Offset Cancellation Sense Amplifier for Deep Submicrometer STT-RAM , 2015, IEEE Transactions on Circuits and Systems I: Regular Papers.
[73] Chankyung Kim,et al. 7.4 A covalent-bonded cross-coupled current-mode sense amplifier for STT-MRAM with 1T1MTJ common source-line structure array , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.
[74] Norbert Wehn,et al. DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool , 2015, International Journal of Parallel Programming.
[75] Jaejin Lee,et al. Design considerations of HBM stacked DRAM and the memory architecture extension , 2015, 2015 IEEE Custom Integrated Circuits Conference (CICC).
[76] Ronald G. Dreslinski,et al. Enhancing DRAM Self-Refresh for Idle Power Reduction , 2016, ISLPED.
[77] Jeong-Heon Park,et al. Dependence of Voltage and Size on Write Error Rates in Spin-Transfer Torque Magnetic Random-Access Memory , 2016, IEEE Magnetics Letters.
[78] Kee-Won Kwon,et al. Inverted bit-line sense amplifier with offset-cancellation capability , 2016 .
[79] M. Bangar,et al. Systematic validation of 2x nm diameter perpendicular MTJ arrays and MgO barrier for sub-10 nm embedded STT-MRAM with practically unlimited endurance , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).
[80] Henk Corporaal,et al. Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs , 2016, IEEE Transactions on Computers.
[81] Milan Radulovic,et al. Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC , 2016, MEMSYS.
[82] H. Kanaya,et al. 4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell structure , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).
[83] Arun Sharma,et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..
[84] Kang L. Wang,et al. Write Error Rate and Read Disturbance in Electric-Field-Controlled Magnetic Random-Access Memory , 2017, IEEE Magnetics Letters.
[85] William J. Dally,et al. Architecting an Energy-Efficient DRAM System for GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[86] Akihito Yamamoto,et al. 23.5 A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).
[87] Jing Li,et al. Evaluating Row Buffer Locality in Future Non-Volatile Main Memories , 2018, ArXiv.
[88] Alberto Cano,et al. A survey on graphic processing unit computing for large‐scale data mining , 2018, WIREs Data Mining Knowl. Discov..