论文信息 - Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation

Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Cooperation

A growth in data volume, combined with increasing demand for real-time analysis (using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchroniz-ing across separate single-purpose databases. Unfortunately, for many applications that perform a high rate of data updates, state-of-the-art HTAP systems incur signiﬁcant losses in transactional (up to 74.6%) and/or analytical (up to 49.8%) throughput compared to performing only transactional or only analytical queries in isolation, due to (1) data movement between the CPU and memory, (2) data update propagation from transactional to analytical workloads, and (3) the cost to maintain a consistent view of data across the system. We propose Polynesia , a hardware–software co-designed system for in-memory HTAP databases that avoids the large throughput losses of traditional HTAP systems. Polynesia (1) divides the HTAP system into transactional and analytical processing islands, (2) implements new custom hardware that un-locks software optimizations to reduce the costs of update propagation and consistency, and (3) exploits processing-in-memory for the analytical islands to alleviate data movement overheads. Our evaluation shows that Polynesia outperforms three state-of-the-art HTAP systems, with average transactional/analytical throughput improvements of 1.7 × /3.7 × , and reduces energy consumption by 48% over the prior lowest-energy HTAP system.

Geraldo F. Oliveira | Amirali Boroumand | O. Mutlu | Saugata Ghose

[1] Onur Mutlu,et al. MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[2] Rachata Ausavarungnirun,et al. RowClone: Accelerating Data Movement and Initialization Using DRAM , 2018, ArXiv.

[3] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.

[4] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[5] Michael Stonebraker,et al. H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[6] Jeremie S. Kim,et al. GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis , 2022, ArXiv.

[7] Anastasia Ailamaki,et al. Adaptive HTAP through Elastic Resource Scheduling , 2020, SIGMOD Conference.

[8] Oscar G. Plata,et al. Enabling fast and energy-efficient FM-index exact matching using processing-near-memory , 2021, The Journal of Supercomputing.

[9] Alfons Kemper,et al. ScyPer: A Hybrid OLTP&OLAP Distributed Main Memory Database System for Scalable Real-Time Analytics , 2013, BTW.

[10] Bradford M. Beckmann,et al. The gem5 simulator , 2011, CARN.

[11] Jens Dittrich,et al. Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting , 2017, SIGMOD Conference.

[12] Xu Chen,et al. KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial , 2017, KDD.

[13] Rachata Ausavarungnirun,et al. CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[14] Onur Mutlu,et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks , 2021, IEEE Access.

[15] Daniel Sánchez,et al. Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16] Manos Athanassoulis,et al. Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[17] Josep Torrellas,et al. Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18] Sander Stuijk,et al. NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[19] Jon T. S. Quah,et al. Real Time Credit Card Fraud Detection using Computational Intelligence , 2007, 2007 International Joint Conference on Neural Networks.

[20] Luigi Carro,et al. HIPE: HMC instruction predication extension applied on database processing , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21] Chuan-Ming Liu,et al. Big data stream computing in healthcare real-time analytics , 2016, 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[22] Sally Chisholm. Adopting medical technologies and diagnostics recommended by NICE: the Health Technologies Adoption Programme. , 2014, Annals of the Royal College of Surgeons of England.

[23] Onur Mutlu,et al. Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[24] Xuan Zhang,et al. Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM , 2021, IEEE Micro.

[25] Onur Mutlu,et al. Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks , 2021, 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26] Mahmut T. Kandemir,et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[27] Yu Wang,et al. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28] Norman May,et al. Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads , 2013, ADMS@VLDB.

[29] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.

[30] Christian S. Jensen,et al. A comparison of the use of virtual versus physical snapshots for supporting update-intensive workloads , 2012, DaMoN '12.

[31] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[32] Maya Gokhale,et al. Combining Emulation and Simulation to Evaluate a Near Memory Key/Value Lookup Accelerator , 2021, ArXiv.

[33] Pei Liu,et al. 3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[34] Luca Benini,et al. A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube , 2014 .

[35] Matthew Poremba,et al. There and back again: Optimizing the interconnect in networks of memory cubes , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[36] Sander Stuijk,et al. Accelerating Weather Prediction Using Near-Memory Reconfigurable Fabric , 2021, ACM Trans. Reconfigurable Technol. Syst..

[37] David J. DeWitt,et al. Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[38] Alfons Kemper,et al. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems , 2015, SIGMOD Conference.

[39] Onur Mutlu,et al. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40] Peter C. Ma,et al. Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[41] Izzat El Hajj,et al. Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture , 2021, ArXiv.

[42] Srinivas Devadas,et al. TicToc: Time Traveling Optimistic Concurrency Control , 2016, SIGMOD Conference.

[43] Yuhwan Ro,et al. Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond , 2021, 2021 IEEE Hot Chips 33 Symposium (HCS).

[44] Zhengping Qian,et al. Real-time Constrained Cycle Detection in Large Dynamic Graphs , 2018, Proc. VLDB Endow..

[45] O Seongil,et al. 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[46] Onur Mutlu,et al. The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[47] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[48] Norman May,et al. Scaling Up Mixed Workloads: A Battle of Data Freshness, Flexibility, and Scheduling , 2014, TPCTC.

[49] Wolfgang Lehner,et al. SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[50] Jim Gray,et al. A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[51] Michael Stonebraker,et al. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..

[52] Mustafa Canim,et al. L-Store: A Real-time OLTP and OLAP System , 2016, EDBT.

[53] Onur Mutlu,et al. Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54] Reetuparna Das,et al. Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[55] Goetz Graefe,et al. The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[56] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[57] Il Memming Park,et al. A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep-Learning Applications , 2022, 2022 IEEE International Solid- State Circuits Conference (ISSCC).

[58] Sai Prashanth Muralidhara,et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[59] Onur Mutlu,et al. Main Memory Scaling: Challenges and Solution Directions , 2015 .

[60] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[61] Parthasarathy Ranganathan,et al. Warehouse-scale video acceleration: co-design and deployment in the wild , 2021, ASPLOS.

[62] Onur Mutlu,et al. Improving memory Bank-Level Parallelism in the presence of prefetching , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[63] O Seongil,et al. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[64] Maurice Herlihy,et al. Concurrent Data Structures for Near-Memory Computing , 2017, SPAA.

[65] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[66] Onur Mutlu,et al. A Workload and Programming Ease Driven Perspective of Processing-in-Memory , 2019, ArXiv.

[67] Kevin Wilkinson,et al. Janus: Transaction Processing of Navigation and Analytic Graph Queries on Many-core Servers , 2017, CIDR.

[68] Onur Mutlu,et al. BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.

[69] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[70] Craig Freedman,et al. Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[71] Rachata Ausavarungnirun,et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.

[72] Alexander Zeier,et al. HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[73] Divyakant Agrawal,et al. Janus: A Hybrid Scalable Multi-Representation Cloud Datastore , 2018, IEEE Transactions on Knowledge and Data Engineering.

[74] Onur Mutlu,et al. PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM , 2021, ArXiv.

[75] Babak Falsafi,et al. The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[76] Onur Mutlu,et al. D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput , 2018, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[77] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[78] Hamid Pirahesh,et al. WiSer: A Highly Available HTAP DBMS for IoT Applications , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[79] Onur Mutlu,et al. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[80] Torsten Hoefler,et al. SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems , 2021, MICRO.

[81] Babak Falsafi,et al. Sort vs. Hash Join Revisited for Near-Memory Execution , 2015 .

[82] Yu Xia,et al. Taurus: Lightweight Parallel Logging for In-Memory Database Management Systems (Extended Version) , 2020, VLDB 2020.

[83] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[84] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[85] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[86] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[87] Irving L. Traiger,et al. The notions of consistency and predicate locks in a database system , 1976, CACM.

[88] Jing Wang,et al. Processing-in-Memory Enabled Graphics Processors for 3D Rendering , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[89] Feifei Li,et al. Fixed-function hardware sorting accelerators for near data MapReduce execution , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[90] Yanzhi Wang,et al. GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.

[91] Viktor Leis,et al. Compiling Database Queries into Machine Code , 2014, IEEE Data Eng. Bull..

[92] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[93] Onur Mutlu,et al. Accelerating Dependent Cache Misses with an Enhanced Memory Controller , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[94] Luigi Carro,et al. Processing in 3D memories to speed up operations on complex data structures , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[95] Onur Mutlu,et al. SIMDRAM: a framework for bit-serial SIMD processing using DRAM , 2020, ASPLOS.

[96] Alexander Zeier,et al. Speeding Up Queries in Column Stores - A Case for Compression , 2010, DaWak.

[97] Franz Franchetti,et al. Data reorganization in memory using 3D-stacked DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[98] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[99] Juliana Freire,et al. Virtual lightweight snapshots for consistent analytics in NoSQL stores , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[100] Marcin Zukowski,et al. MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[101] Jaeha Kim,et al. Memory-centric system interconnect design with Hybrid Memory Cubes , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[102] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[103] Onur Mutlu,et al. Techniques for efficient processing in runahead execution engines , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[104] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[105] Anastasia Ailamaki,et al. Performance Characterization of HTAP Workloads , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[106] Onur Mutlu,et al. pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables , 2021, 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO).

[107] Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[108] Jialiang Zhang,et al. MEG: A RISCV-based System Emulation Infrastructure for Near-data Processing Using FPGAs and High-bandwidth Memory , 2020, ACM Trans. Reconfigurable Technol. Syst..

[109] Setrag Khoshafian,et al. A decomposition storage model , 1985, SIGMOD Conference.

[110] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[111] Babak Falsafi,et al. Near-Memory Address Translation , 2016, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[112] Rachata Ausavarungnirun,et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.

[113] Weiyun Huang,et al. Real-Time Analytical Processing with SQL Server , 2015, Proc. VLDB Endow..

[114] Onur Mutlu,et al. Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance , 2006, IEEE Micro.

[115] Michael Stonebraker,et al. C-Store: A Column-oriented DBMS , 2005, VLDB.

[116] Damla Senol Cali,et al. FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications , 2021, IEEE Micro.

[117] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.

[118] Qi Liu,et al. TiDB , 2020, Proc. VLDB Endow..

[119] Rachata Ausavarungnirun,et al. A Modern Primer on Processing in Memory , 2020, ArXiv.

[120] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[121] Kevin Kai-Wei Chang,et al. DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators , 2016, ACM Trans. Archit. Code Optim..

[122] Jayanthi Ranjan,et al. Real time business intelligence in supply chain analytics , 2008, Inf. Manag. Comput. Secur..

[123] Hyesoon Kim,et al. BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[124] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[125] Onur Mutlu,et al. In-DRAM Bulk Bitwise Execution Engine , 2019, ArXiv.

[126] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[127] Rachata Ausavarungnirun,et al. GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[128] Gustavo Alonso,et al. BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications , 2017, SIGMOD Conference.

[129] Yuanyuan Tian,et al. Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[130] Nectarios Koziris,et al. SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[131] Bingsheng He,et al. Database compression on graphics processors , 2010, Proc. VLDB Endow..

[132] David Wentzlaff,et al. ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs , 2019, MICRO.

[133] Ramyad Hadidi,et al. CAIRO , 2017, ACM Trans. Archit. Code Optim..

[134] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[135] Tejas Karkhanis,et al. Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[136] Jeremie S. Kim,et al. FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[137] Ran Ginosar,et al. GP-SIMD Processing-in-Memory , 2015, ACM Trans. Archit. Code Optim..

[138] Carsten Binnig,et al. Dictionary-based order-preserving string compression for main memory column stores , 2009, SIGMOD Conference.

[139] Shaahin Angizi,et al. GraphiDe: A Graph Processing Accelerator leveraging In-DRAM-Computing , 2019, ACM Great Lakes Symposium on VLSI.

[140] Onur Mutlu,et al. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM , 2016, ArXiv.

[141] Duncan G. Elliott,et al. Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[142] Onur Mutlu,et al. Casper: Accelerating Stencil Computation using Near-cache Processing , 2021, ArXiv.

[143] Maya Gokhale,et al. In-Memory Data Rearrangement for Irregular, Data-Intensive Computing , 2015, Computer.

[144] Philip A. Bernstein,et al. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[145] Jongmoo Choi,et al. Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[146] Onur Mutlu,et al. Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[147] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[148] Yuan Qi,et al. TitAnt: Online Real-time Transaction Fraud Detection in Ant Financial , 2019, Proc. VLDB Endow..

[149] Steffen Zeuch,et al. QTM: Modelling Query Execution with Tasks , 2014, ADMS@VLDB.

[150] Norman May,et al. The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[151] Anastasia Ailamaki,et al. The Case For Heterogeneous HTAP , 2017, CIDR.

[152] Onur Mutlu,et al. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[153] Srinivas Devadas,et al. Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System , 2018, Proc. VLDB Endow..

[154] Viktor Leis,et al. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[155] Edward D. Lazowska,et al. A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing , 1986, Perform. Evaluation.

[156] Masoud Daneshtalab,et al. NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories , 2020, IEEE Computer Architecture Letters.

[157] Andrew F. Glew. MLP yes! ILP no , 1998, ASPLOS 1998.

[158] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[159] Wook-Shin Han,et al. Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP Workloads , 2017, Proc. VLDB Endow..

[160] Maya Gokhale,et al. Near memory key/value lookup acceleration , 2017, MEMSYS.

[161] Fabrice Devaux,et al. The true Processing In Memory accelerator , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[162] Luigi Carro,et al. Operand size reconfiguration for big data processing in memory , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[163] L. V. Gutierrez,et al. ASIC Clouds: Specializing the Datacenter , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[164] Onur Mutlu,et al. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies , 2017, BMC Genomics.

[165] Franz Franchetti,et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[166] Onur Mutlu,et al. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory , 2017, IEEE Computer Architecture Letters.

[167] Thomas Neumann,et al. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.

[168] Eitan Medina,et al. Habana Labs Purpose-Built AI Inference and Training Processor Architectures: Scaling AI Training Systems Using Standard Ethernet With Gaudi Processor , 2020, IEEE Micro.

[169] Zhe Zhang,et al. 184QPS/W 64Mb/mm23D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System , 2022, IEEE International Solid-State Circuits Conference.

[170] Onur Mutlu,et al. LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures , 2017, ArXiv.

[171] Onur Mutlu,et al. GRIM-filter: fast seed filtering in read mapping using emerging memory technologies , 2017, 1708.04329.

[172] William H. Kautz,et al. Cellular Logic-in-Memory Arrays , 1969, IEEE Transactions on Computers.

[173] Dinesh Das,et al. Oracle Database In-Memory: A dual format in-memory database , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[174] Peter Bumbulis,et al. Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads , 2015, Proc. VLDB Endow..

[175] Onur Mutlu,et al. Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware , 2021, 2021 12th International Green and Sustainable Computing Conference (IGSC).

[176] Maya Gokhale,et al. Towards a scatter-gather architecture: hardware and software issues , 2019, MEMSYS.

[177] Mohammad Sadoghi,et al. Hybrid OLTP and OLAP , 2019, Encyclopedia of Big Data Technologies.

[178] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.

[179] Onur Mutlu,et al. QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[180] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[181] Luca Benini,et al. Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[182] Stratos Idreos,et al. JAFAR: Near-Data Processing for Databases , 2015, SIGMOD Conference.

[183] Luigi Carro,et al. NIM: An HMC-Based Machine for Neuron Computation , 2017, ARC.

[184] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.

[185] Francisco J. Cazorla,et al. Multicore Resource Management , 2008, IEEE Micro.

[186] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[187] Chun Chen,et al. The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[188] Onur Mutlu,et al. Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[189] Goetz Graefe,et al. Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[190] Maya Gokhale,et al. Near memory data structure rearrangement , 2015, MEMSYS.

[191] Lavanya Subramanian,et al. Providing High and Controllable Performance in Multicore Systems Through Shared Resource Management , 2015, ArXiv.

[192] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.

[193] Amirali Boroumand. Practical Mechanisms for Reducing Processor-Memory Data Movement in Modern Workloads , 2021 .

[194] Michael J. Cahill. Serializable isolation for snapshot databases , 2009, TODS.

[195] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[196] Alfons Kemper,et al. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[197] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[198] Luca Benini,et al. High performance AXI-4.0 based interconnect for extensible smart memory cubes , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[199] Thomas Neumann,et al. Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[200] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[201] Onur Mutlu,et al. Processing-in-memory: A workload-driven perspective , 2019, IBM J. Res. Dev..

[202] Onur Mutlu,et al. Continuous runahead: Transparent hardware acceleration for memory intensive workloads , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[203] Oracle GoldenGate 12c: Real-Time Access to Real-Time Information , 2014 .

[204] Pradeep Dubey,et al. Fast Updates on Read-Optimized Databases Using Multi-Core CPUs , 2011, Proc. VLDB Endow..

[205] Rasit O. Topaloglu,et al. More than Moore Technologies for Next Generation Computer Design , 2015 .

[206] Andrew Pavlo,et al. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[207] Harold S. Stone,et al. A Logic-in-Memory Computer , 1970, IEEE Transactions on Computers.

[208] Onur Mutlu,et al. Accelerating Genome Analysis: A Primer on an Ongoing Journey , 2020, IEEE Micro.

[209] Yu Huang,et al. A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[210] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[211] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[212] Oscar Plata,et al. NATSA: A Near-Data Processing Accelerator for Time Series Analysis , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).

[213] Jinyoung Lee,et al. Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[214] Bahar Asgari,et al. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[215] Barzan Mozafari,et al. SnappyData : Streaming , Transactions , and Interactive Analytics in a Unified Engine , 2016 .

[216] Franz Franchetti,et al. HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[217] E. F. Codd,et al. The Relational Model for Database Management, Version 2 , 1990 .

[218] Gwangsun Kim,et al. Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[219] Norman May,et al. Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement , 2015, Proc. VLDB Endow..