Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
暂无分享,去创建一个
[1] Makoto Motoyoshi,et al. Through-Silicon Via (TSV) , 2009, Proceedings of the IEEE.
[2] Franz Franchetti,et al. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).
[3] Duncan G. Elliott,et al. Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..
[4] Robert S. Germain,et al. Using the Active Storage Fabrics model to address petascale storage challenges , 2009, PDSW '09.
[5] Francisco J. Cazorla,et al. Kilo-instruction processors: overcoming the memory wall , 2005, IEEE Micro.
[6] Marco Minutoli,et al. Implementing Radix Sort on Emu 1 , 2015 .
[7] Werner Weber,et al. Performance improvement of the memory hierarchy of RISC-systems by application of 3-D-technology , 1995, 1995 Proceedings. 45th Electronic Components and Technology Conference.
[8] Peter M. Kogge,et al. [2009] Exploring the Possible Past Futures of a Single Part Type Multi-core PIM Chip , 2010, 2010 International Workshop on Innovative Architecture for Future Generation High Performance.
[9] Katherine Yelick,et al. A Case for Intelligent DRAM: IRAM , 1998 .
[10] Rae A. Earnshaw,et al. State of the Art in Computer Graphics: Visualization and Modeling , 2011 .
[11] Jaejin Lee,et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[12] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[13] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[14] Fabrice Paillet,et al. FIVR — Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs , 2014, 2014 IEEE Applied Power Electronics Conference and Exposition - APEC 2014.
[15] Harold S. Stone,et al. A Logic-in-Memory Computer , 1970, IEEE Transactions on Computers.
[16] Hiroaki Kobayashi,et al. Vertically integrated processor and memory module design for vector supercomputers , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).
[17] M. Mitchell Waldrop,et al. The chips are down for Moore’s law , 2016, Nature.
[18] Sascha Vongehr,et al. The Missing Memristor has Not been Found , 2015, Scientific Reports.
[19] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[20] Karthikeyan Sankaralingam,et al. Memory processing units , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[21] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[22] Seung-Moon Yoo,et al. FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[23] Phillip Stanley-Marbell,et al. Pinned to the walls — Impact of packaging and application properties on the memory and power walls , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.
[24] L. Chua. Memristor-The missing circuit element , 1971 .
[25] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[26] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .
[27] Fred J. Pollack. New microarchitecture challenges in the coming generations of CMOS process technologies (keynote address)(abstract only) , 1999, MICRO.
[28] Jan Reineke,et al. Ascertaining Uncertainty for Efficient Exact Cache Analysis , 2017, CAV.
[29] Mike Ignatowski,et al. A new perspective on processing-in-memory architecture design , 2013, MSPC '13.
[30] Y. Fujita,et al. A 7.68 GIPS 3.84 GB/s 1W parallel image processing RAM integrating a 16 Mb DRAM and 128 processors , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.
[31] Tom Coughlin. Crossing the Chasm to New Solid-State Storage Architectures [The Art of Storage] , 2016, IEEE Consumer Electron. Mag..
[32] Stuart R. Ball. Embedded microprocessor systems , 1996 .
[33] Kunle Olukotun,et al. Energy-Efficient Abundant-Data Computing: The N3XT 1,000x , 2015, Computer.
[34] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[35] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[36] David Daly,et al. The cache and memory subsystems of the IBM POWER8 processor , 2015, IBM J. Res. Dev..
[37] Wolfgang Straßer,et al. GRAMMY: High Performance Graphics Using Graphics Memories , 1996 .
[38] Jian Xu,et al. Demystifying 3D ICs: the pros and cons of going vertical , 2005, IEEE Design & Test of Computers.
[39] William H. Kautz,et al. Cellular Logic-in-Memory Arrays , 1969, IEEE Transactions on Computers.
[40] Peter M. Kogge,et al. [2010] Facing the Exascale Energy Wall , 2010, 2010 International Workshop on Innovative Architecture for Future Generation High Performance.
[41] Michael F. Deering,et al. FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.
[42] R. H. Dennard. Technical literature [Reprint of "Field-Effect Transistor Memory" (US Patent No. 3,387,286)] , 2008, IEEE Solid-State Circuits Newsletter.
[43] Krishna M. Kavi,et al. Processing-in-Memory: Exploring the Design Space , 2015, ARCS.
[44] Zhen Fang,et al. Quantifying the performance contribution of various aspects of AMOs , 2022 .
[45] Guang R. Gao,et al. Processing In Memory: Chips to Petaflops , 1997, ISCA 1997.
[46] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[47] John Shalf,et al. Computing beyond Moore's Law , 2015, Computer.
[48] Hiroshi Nakamura,et al. A scalable 3D heterogeneous multi-core processor with inductive-coupling thruchip interface , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[49] Zhen Fang,et al. Active memory controller , 2012, The Journal of Supercomputing.
[50] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[51] Peter M. Kogge,et al. Updating the Energy Model for Future Exascale Systems , 2015, ISC.
[52] Paul D. Franzon,et al. A low power 3D integrated FFT engine using hypercube memory division , 2009, ISLPED.
[53] Christos Faloutsos,et al. Active Disks for Large-Scale Data Processing , 2001, Computer.
[54] Noah Treuhaft,et al. Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.
[55] N. Okumura,et al. A multimedia 32 b RISC microprocessor with 16 Mb DRAM , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.
[56] Houman Homayoun,et al. Wide I/O or LPDDR? Exploration and analysis of performance, power and temperature trade-offs of emerging DRAM technologies in embedded MPSoCs , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).
[57] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[58] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[59] Tejas Karkhanis,et al. Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..
[60] Kevin W. Boyack,et al. Data-centric computing with the Netezza architecture. , 2006 .
[61] Haitham Akkary,et al. Simultaneous continual flow pipeline architecture , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).
[62] Peter M. Kogge,et al. Facing the Exascale Energy Wall. , 2010 .
[63] Mikko H. Lipasti,et al. Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[64] Andreas Schilling,et al. Eliminating the Z-Buffer bottleneck , 1995, Proceedings the European Design and Test Conference. ED&TC 1995.
[65] James A. Kahle,et al. The Cell Processor Architecture , 2005, MICRO.
[66] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[67] Dave Bergeron,et al. More than Moore , 2008, CICC.
[68] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[69] Michael Gschwind,et al. The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.
[70] Parthasarathy Ranganathan,et al. From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.
[71] Alan Gara. The long term impact of codesign , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[72] P. Townsend,et al. High Performance Computing for Computer Graphics and Visualisation , 1996, Springer London.
[73] Josep Llosa,et al. Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[74] Nam Sung Kim,et al. Reevaluating the latency claims of 3D stacked memories , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).
[75] José R. Brunheroto,et al. Accelerating LBM and LQCD Application Kernels by In-Memory Processing , 2015, ISC.
[76] Rainer Buchty,et al. Revealing Potential Performance Improvements by Utilizing Hybrid Work-Sharing for Resource-Intensive Seismic Applications , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[77] Jung Ho Ahn,et al. DRAMA: An Architecture for Accelerated Processing Near Memory , 2015, IEEE Computer Architecture Letters.
[78] Youngil Kim,et al. Analysis of thermal behavior for 3D integration of DRAM , 2014, The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014).
[79] John von Neumann,et al. First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.
[80] David L Weaver,et al. The SPARC architecture manual : version 9 , 1994 .
[81] Peter M. Kogge,et al. Combined DRAM and logic chip for massively parallel systems , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.
[82] Robert C. Minnick,et al. A Survey of Microcellular Research , 1967, JACM.
[83] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[84] Tack-Don Han,et al. An effective memory-processor integrated architecture for computer vision , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).
[85] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[86] Stefanos Kaxiras,et al. Distributed Vector Architecture: Beyond a Single Vector-IRAM , 1997 .
[87] Peter M. Kogge,et al. A low cost, multithreaded processing-in-memory system , 2004, WMPI '04.
[88] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[89] Zain-ul-Abdin,et al. Kickstarting high-performance energy-efficient manycore architectures with Epiphany , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.
[90] Martin Burtscher,et al. Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.
[91] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[92] Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .
[93] Jay B. Brockman,et al. PIM lite: a multithreaded processor-in-memory prototype , 2005, GLSVLSI '05.
[94] A. Kopser,et al. Overview of the Next Generation Cray XMT , 2011 .
[95] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[96] Guoqi Zhang,et al. More than Moore: Creating High Value Micro/Nanoelectronics Systems , 2009 .
[97] Stéphan Jourdan,et al. Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.
[98] Jaejin Lee,et al. A 1.2 V 8 Gb 8-Channel 128 GB/s High-Bandwidth Memory (HBM) Stacked DRAM With Effective I/O Test Circuits , 2015, IEEE Journal of Solid-State Circuits.
[99] S. F. Reddaway. DAP—a distributed array processor , 1973, ISCA '73.
[100] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[101] Robert A. Short,et al. CELLULAR ARRAYS FOR LOGIC AND STORAGE. , 1966 .
[102] Andreas Schilling,et al. High Performance Texture Mapping Architectures , 1996 .
[103] K. Yelick,et al. Intelligent RAM (IRAM): chips that remember and compute , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.
[104] Yuan Xie. Future memory and interconnect technologies , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).