Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major bottleneck for system performance and energy consumption [1], [2]. One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM) [2]–[12], where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory. In the data-centric PIM paradigm, the logic close to memory has access to data with significantly higher memory bandwidth, lower latency, and lower energy consumption than processors/accelerators in existing processor-centric systems.

[1]  Jeremie S. Kim,et al.  SeGraM: a universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping , 2022, ISCA.

[2]  Geraldo F. Oliveira,et al.  Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design , 2022, 2022 IEEE 38th International Conference on Data Engineering (ICDE).

[3]  G. Edward Suh,et al.  SecNDP: Secure Near-Data Processing with Untrusted Memory , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[4]  Hai Jin,et al.  Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory Architectures , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[5]  Jeremie S. Kim,et al.  GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis , 2022, ArXiv.

[6]  Zhe Zhang,et al.  184QPS/W 64Mb/mm23D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System , 2022, IEEE International Solid-State Circuits Conference.

[7]  Il Memming Park,et al.  A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications , 2022, 2022 IEEE International Solid- State Circuits Conference (ISSCC).

[8]  Jeremie S. Kim,et al.  DR-STRaNGe: End-to-End System Design for DRAM-based True Random Number Generators , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[9]  Geraldo F. Oliveira,et al.  Casper: Accelerating Stencil Computation using Near-cache Processing , 2021, ArXiv.

[10]  Shaahin Angizi,et al.  PIM-Quantifier: A Processing-in-Memory Platform for mRNA Quantification , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[11]  Yu Wang,et al.  Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation , 2021, 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[12]  Jung Ho Ahn,et al.  TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory , 2021, MICRO.

[13]  Geraldo F. Oliveira,et al.  Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware , 2021, 2021 12th International Green and Sustainable Computing Conference (IGSC).

[14]  Onur Mutlu,et al.  Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks , 2021, 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Kevin Skadron,et al.  Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing , 2021, 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Xuan Zhang,et al.  Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM , 2021, IEEE Micro.

[17]  Damla Senol Cali,et al.  FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications , 2021, IEEE Micro.

[18]  Hang Liu,et al.  FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[19]  Anant V. Nori,et al.  REDUCT: Keep it Close, Keep it Cool! : Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[20]  O Seongil,et al.  Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[21]  Kevin Skadron,et al.  Sieve: Scalable In-situ DRAM-based Accelerator Designs for Massively Parallel k-mer Matching , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[22]  Depei Qian,et al.  Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee , 2021, ACM Trans. Comput. Syst..

[23]  Amirali Boroumand Practical Mechanisms for Reducing Processor-Memory Data Movement in Modern Workloads , 2021 .

[24]  Onur Mutlu,et al.  QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[25]  Izzat El Hajj,et al.  Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture , 2021, ArXiv.

[26]  Onur Mutlu,et al.  DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks , 2021, IEEE Access.

[27]  Torsten Hoefler,et al.  SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems , 2021, MICRO.

[28]  Klaus Meyer-Wegener,et al.  COPRAO: A Capability Aware Query Optimizer for Reconfigurable Near Data Processors , 2021, 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW).

[29]  Onur Mutlu,et al.  Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design , 2021, ArXiv.

[30]  Onur Mutlu,et al.  Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models , 2021, ArXiv.

[31]  O Seongil,et al.  25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[32]  Lei Deng,et al.  SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[33]  Carole-Jean Wu,et al.  RecSSD: near data processing for solid state drive based recommendation inference , 2021, ASPLOS.

[34]  Nectarios Koziris,et al.  SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[35]  Ning-Hui Sun,et al.  PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm , 2021, J. Comput. Sci. Technol..

[36]  Onur Mutlu,et al.  Intelligent Architectures for Intelligent Computing Systems , 2020, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[37]  Onur Mutlu,et al.  SIMDRAM: a framework for bit-serial SIMD processing using DRAM , 2020, ASPLOS.

[38]  O. Mutlu,et al.  A Modern Primer on Processing in Memory , 2020, ArXiv.

[39]  Oscar Plata,et al.  NATSA: A Near-Data Processing Accelerator for Time Series Analysis , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).

[40]  Sungjin Lee,et al.  AQUOMAN: An Analytic-Query Offloading Machine , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Sachin S. Sapatnekar,et al.  MOUSE: Inference In Non-volatile Memory for Energy Harvesting Applications , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[42]  Mohsen Imani,et al.  DUAL: Acceleration of Clustering Algorithms using Digital-based Processing In-Memory , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Hyesoon Kim,et al.  Things to Consider to Enable Dynamic Graphs in Processing-in-Memory , 2020, International Symposium on Memory Systems.

[44]  Jeremie S. Kim,et al.  FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[45]  Rachata Ausavarungnirun,et al.  GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  Wei Zhang,et al.  Exploring DNA Alignment-in-Memory Leveraging Emerging SOT-MRAM , 2020, ACM Great Lakes Symposium on VLSI.

[47]  David Atienza,et al.  BLADE: An in-Cache Computing Architecture for Edge Devices , 2020, IEEE Transactions on Computers.

[48]  Sander Stuijk,et al.  NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling , 2020, 2020 30th International Conference on Field-Programmable Logic and Applications (FPL).

[49]  Onur Mutlu,et al.  Intelligent Architectures for Intelligent Machines , 2020, 2020 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[50]  Qi Liu,et al.  TiDB , 2020, Proc. VLDB Endow..

[51]  Onur Mutlu,et al.  Accelerating Genome Analysis: A Primer on an Ongoing Journey , 2020, IEEE Micro.

[52]  Eunhyeok Park,et al.  McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge , 2020, IEEE Access.

[53]  Nikil D. Dutt,et al.  CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[54]  Edward A. Lee,et al.  Q-PIM: A Genetic Algorithm based Flexible DNN Quantization Method and Application to Processing-In-Memory Platform , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[55]  X. Hu,et al.  Computing-in-Memory for Performance and Energy-Efficient Homomorphic Encryption , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[56]  Yu Huang,et al.  A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[57]  Yu Huang,et al.  Spara: An Energy-Efficient ReRAM-Based Accelerator for Sparse Graph Analytics Applications , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[58]  Sachin S. Sapatnekar,et al.  A DNA Read Alignment Accelerator Based on Computational RAM , 2020, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[59]  Wei Zhang,et al.  PIM-Aligner: A Processing-in-MRAM Platform for Biological Sequence Alignment , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[60]  Wang Ling Goh,et al.  Performance Analysis of Convolutional Neural Network Using Multi-level Memristor Crossbar for Edge Computing , 2020, 2020 3rd International Conference on Intelligent Autonomous Systems (ICoIAS).

[61]  Masoud Daneshtalab,et al.  NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories , 2020, IEEE Computer Architecture Letters.

[62]  Yiran Chen,et al.  PARC: A Processing-in-CAM Architecture for Genomic Long Read Pairwise Alignment using ReRAM , 2020, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[63]  Martin D. Schatz,et al.  RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[64]  Eduardo Cunha de Almeida,et al.  Database Processing-in-Memory: An Experimental Study , 2019, Proc. VLDB Endow..

[65]  Rajeev Balasubramonian,et al.  GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment , 2019, MICRO.

[66]  Yanzhi Wang,et al.  GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.

[67]  Meng-Fan Chang,et al.  Circuit Design Challenges in Computing-in-Memory for AI Edge Devices , 2019, 2019 IEEE 13th International Conference on ASIC (ASICON).

[68]  Yifan Gong,et al.  Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[69]  Shanshan Lu,et al.  Agile Query Processing in Statistical Databases: A Process-In-Memory Approach , 2019, KSEM.

[70]  Minsoo Rhu,et al.  TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.

[71]  Onur Mutlu,et al.  Processing-in-memory: A workload-driven perspective , 2019, IBM J. Res. Dev..

[72]  Hamid Pirahesh,et al.  WiSer: A Highly Available HTAP DBMS for IoT Applications , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[73]  Fabrice Devaux,et al.  The true Processing In Memory accelerator , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[74]  Mohsen Imani,et al.  RAPID: A ReRAM Processing in-Memory Architecture for DNA Sequence Alignment , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[75]  Yuan Qi,et al.  TitAnt: Online Real-time Transaction Fraud Detection in Ant Financial , 2019, Proc. VLDB Endow..

[76]  Shaahin Angizi,et al.  AlignS: A Processing-In-Memory Accelerator for DNA Short Read Alignment Leveraging SOT-MRAM , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[77]  Sander Stuijk,et al.  NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[78]  Rachata Ausavarungnirun,et al.  CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[79]  Tajana Simunic,et al.  FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[80]  Onur Mutlu,et al.  In-DRAM Bulk Bitwise Execution Engine , 2019, ArXiv.

[81]  Shaahin Angizi,et al.  GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing , 2019, ACM Great Lakes Symposium on VLSI.

[82]  Tajana Simunic,et al.  DigitalPIM: Digital-based Processing In-Memory for Big Data Acceleration , 2019, ACM Great Lakes Symposium on VLSI.

[83]  Rachata Ausavarungnirun,et al.  Enabling Practical Processing in and near Memory for Data-Intensive Computing , 2019, DAC.

[84]  Yu Wang,et al.  GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[85]  Ying Wang,et al.  Leveraging Memory PUFs and PIM-based encryption to secure edge deep learning systems , 2019, 2019 IEEE 37th VLSI Test Symposium (VTS).

[86]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[87]  Yuan Xie,et al.  Near-Data Acceleration of Privacy-Preserving Biomarker Search with 3D-Stacked Memory , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[88]  Rachata Ausavarungnirun,et al.  Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.

[89]  Shaahin Angizi,et al.  GraphS: A Graph Processing Accelerator Leveraging SOT-MRAM , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[90]  Lee-Sup Kim,et al.  NAND-Net: Minimizing Computational Complexity of In-Memory Processing for Binary Neural Networks , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[91]  Tajana Simunic,et al.  GRAM: graph processing in a ReRAM-based computational memory , 2019, ASP-DAC.

[92]  Lei Han,et al.  ReRAM-based processing-in-memory architecture for blockchain platforms , 2019, ASP-DAC.

[93]  Yu Wang,et al.  GraphSAR: a sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs , 2019, ASP-DAC.

[94]  Ran Ginosar,et al.  POSTER: BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[95]  Tara N. Sainath,et al.  Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[96]  Eunhyeok Park,et al.  McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[97]  Yuan Xie,et al.  GraphIA: an in-situ accelerator for large-scale graph processing , 2018, MEMSYS.

[98]  Shaahin Angizi,et al.  PIM-TGAN: A Processing-in-Memory Accelerator for Ternary Generative Adversarial Networks , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[99]  Rachata Ausavarungnirun,et al.  The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption , 2018, Beyond-CMOS Technologies for Next Generation Computer Design.

[100]  Onur Mutlu,et al.  D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput , 2018, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[101]  Zhengping Qian,et al.  Real-time Constrained Cycle Detection in Large Dynamic Graphs , 2018, Proc. VLDB Endow..

[102]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[103]  Saibal Mukhopadhyay,et al.  ReRAM-Based Processing-in-Memory Architecture for Recurrent Neural Network Acceleration , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[104]  Jun Yang,et al.  DrAcc: a DRAM based Accelerator for Accurate CNN Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[105]  David Blaauw,et al.  Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[106]  Luigi Carro,et al.  HIPE: HMC instruction predication extension applied on database processing , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[107]  J. Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[108]  Rachata Ausavarungnirun,et al.  Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.

[109]  Huazhong Yang,et al.  Bidirectional Database Storage and SQL Query Exploiting RRAM-Based Process-in-Memory Structure , 2018, ACM Trans. Storage.

[110]  Duo Liu,et al.  A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal , 2018, ACM Trans. Storage.

[111]  Christoforos E. Kozyrakis,et al.  GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[112]  Onur Mutlu,et al.  The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[113]  Shaahin Angizi,et al.  IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[114]  Rohit Prabhavalkar,et al.  Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[115]  Onur Mutlu,et al.  GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies , 2017, BMC Genomics.

[116]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[117]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[118]  Vladimir Vlassov,et al.  Identifying the potential of near data processing for apache spark , 2017, MEMSYS.

[119]  Jens Dittrich,et al.  Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting , 2017, SIGMOD Conference.

[120]  Yiran Chen,et al.  GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[121]  Xu Chen,et al.  KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial , 2017, KDD.

[122]  Tao Li,et al.  A novel ReRAM-based processing-in-memory architecture for graph computing , 2017, 2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[123]  Christoph Hagleitner,et al.  Boosting the Efficiency of HPCG and Graph500 with Near-Data Processing , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[124]  Maurice Herlihy,et al.  Concurrent Data Structures for Near-Memory Computing , 2017, SPAA.

[125]  Babak Falsafi,et al.  The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[126]  Yuanyuan Tian,et al.  Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[127]  Gustavo Alonso,et al.  BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications , 2017, SIGMOD Conference.

[128]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[129]  Luigi Carro,et al.  NIM: An HMC-Based Machine for Neuron Computation , 2017, ARC.

[130]  Luigi Carro,et al.  Operand size reconfiguration for big data processing in memory , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[131]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[132]  Jing Wang,et al.  Processing-in-Memory Enabled Graphics Processors for 3D Rendering , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[133]  Onur Mutlu,et al.  Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM , 2016, ArXiv.

[134]  Onur Mutlu,et al.  The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR , 2016, ArXiv.

[135]  Onur Mutlu,et al.  Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[136]  Mahmut T. Kandemir,et al.  Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[137]  Chuan-Ming Liu,et al.  Big data stream computing in healthcare real-time analytics , 2016, 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[138]  Onur Mutlu,et al.  Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[139]  Jinyoung Lee,et al.  Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[140]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[141]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[142]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[143]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[144]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[145]  Liu Liu,et al.  Leveraging 3D technologies for hardware security: Opportunities and challenges , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[146]  Luca Benini,et al.  Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube , 2016, ARCS.

[147]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[148]  Christoforos E. Kozyrakis,et al.  Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[149]  Hyesoon Kim,et al.  Instruction Offloading with HMC 2.0 Standard: A Case Study for Graph Traversals , 2015, MEMSYS.

[150]  Weiyun Huang,et al.  Real-Time Analytical Processing with SQL Server , 2015, Proc. VLDB Endow..

[151]  Onur Mutlu,et al.  Fast Bulk Bitwise AND and OR in DRAM , 2015, IEEE Computer Architecture Letters.

[152]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[153]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[154]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[155]  Stratos Idreos,et al.  JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[156]  Rasit O. Topaloglu,et al.  More than Moore Technologies for Next Generation Computer Design , 2015 .

[157]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[158]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[159]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[160]  Sally Chisholm Adopting medical technologies and diagnostics recommended by NICE: the Health Technologies Adoption Programme. , 2014, Annals of the Royal College of Surgeons of England.

[161]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[162]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[163]  Henk Corporaal,et al.  Memory-centric accelerator design for Convolutional Neural Networks , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[164]  Wolfgang Lehner,et al.  SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[165]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[166]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[167]  Jayanthi Ranjan,et al.  Real time business intelligence in supply chain analytics , 2008, Inf. Manag. Comput. Secur..

[168]  Jon T. S. Quah,et al.  Real Time Credit Card Fraud Detection using Computational Intelligence , 2007, 2007 International Joint Conference on Neural Networks.

[169]  Izzat El Hajj,et al.  Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System , 2022, IEEE Access.

[170]  Zhe Zhang,et al.  Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network , 2022, IEEE Access.

[171]  Hai Jin,et al.  ReGra: Accelerating Graph Traversal Applications Using ReRAM With Lower Communication Cost , 2020, IEEE Access.

[172]  Jungmin Kim,et al.  Optimizing Data Movement with Near-Memory Acceleration of In-memory DBMS , 2020, EDBT.

[173]  Hemangee K. Kapoor,et al.  Towards Near Data Processing of Convolutional Neural Networks , 2018, 2018 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID).

[174]  Onur Mutlu,et al.  LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory , 2017, IEEE Computer Architecture Letters.

[175]  Onur Mutlu,et al.  Chapter Four - Simple Operations in Memory to Reduce Data Movement , 2017, Adv. Comput..

[176]  Barzan Mozafari,et al.  SnappyData : Streaming , Transactions , and Interactive Analytics in a Unified Engine , 2016 .

[177]  Onur Mutlu,et al.  Main Memory Scaling: Challenges and Solution Directions , 2015 .