Caribou: Intelligent Distributed Storage

The ever increasing amount of data being handled in data centers causes an intrinsic inefficiency: moving data around is expensive in terms of bandwidth, latency, and power consumption, especially given the low computational complexity of many database operations. In this paper we explore near-data processing in database engines, i.e., the option of offloading part of the computation directly to the storage nodes. We implement our ideas in Caribou, an intelligent distributed storage layer incorporating many of the lessons learned while building systems with specialized hardware. Caribou provides access to DRAM/NVRAM storage over the network through a simple key-value store interface, with each storage node providing high-bandwidth near-data processing at line rate and fault tolerance through replication. The result is a highly efficient, distributed, intelligent data storage that can be used to both boost performance and reduce power consumption and real estate usage in the data center thanks to the micro-server architecture adopted.

[1]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[3]  Gustavo Alonso,et al.  Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures , 2017, SIGMOD Conference.

[4]  Gustavo Alonso,et al.  A flexible hash table design for 10GBPS key-value stores on FPGAS , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[5]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[6]  Gustavo Alonso,et al.  BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications , 2017, SIGMOD Conference.

[7]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[8]  Hiroki Arimura,et al.  Fast Bit-Parallel Matching for Network and Regular Expressions , 2010, SPIRE.

[9]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[10]  Gustavo Alonso,et al.  Complex event detection at wire speed with FPGAs , 2010, Proc. VLDB Endow..

[11]  Jürgen Teich,et al.  Acceleration of SQL Restrictions and Aggregations through FPGA-Based Dynamic Partial Reconfiguration , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[12]  Philippe Bonnet,et al.  The Necessary Death of the Block Device Interface , 2013, CIDR.

[13]  Gustavo Alonso,et al.  Parallelizing Data Processing on FPGAs with Shifter Lists , 2015, TRETS.

[14]  Jens Teubner,et al.  Skeleton automata for FPGAs: reconfiguring without reconstructing , 2012, SIGMOD Conference.

[15]  Sungjin Lee,et al.  BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  Dahlia Malkhi,et al.  Beyond block I/O: implementing a distributed shared log in hardware , 2013, SYSTOR '13.

[17]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[18]  Sungjin Lee,et al.  BlueCache: A Scalable Distributed Flash-based Key-value Store , 2016, Proc. VLDB Endow..

[19]  Sangyeun Cho,et al.  YourSQL: A High-Performance Database System Leveraging In-Storage Computing , 2016, Proc. VLDB Endow..

[20]  Thomas R. Gross,et al.  RStore: A Direct-Access DRAM-based Data Store , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[21]  Ming Liu,et al.  Scalable multi-access flash store for big data analytics , 2014, FPGA.

[22]  Xin Chen,et al.  F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.

[23]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[24]  Gustavo Alonso,et al.  FPGA-based Data Partitioning , 2017, SIGMOD Conference.

[25]  Rajesh Gupta,et al.  Minerva: Accelerating Data Analysis in Next-Generation SSDs , 2013, 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines.

[26]  Mingyu Gao,et al.  HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[27]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[28]  Gustavo Alonso,et al.  Histograms as a side effect of data movement for big data , 2014, SIGMOD Conference.

[29]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[31]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[32]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[33]  Markus Pilman Tell: An Elastic Database System for Mixed Workloads , 2017 .

[34]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[35]  Donald Kossmann,et al.  On the Design and Scalability of Distributed Shared-Data Databases , 2015, SIGMOD Conference.

[36]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[37]  Jürgen Teich,et al.  On-the-fly Composition of FPGA-Based SQL Query Accelerators Using a Partially Reconfigurable Module Library , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[38]  Rong Luo,et al.  Accelerating frequent item counting with FPGA , 2014, FPGA.

[39]  Philip A. Bernstein,et al.  Hyder - A Transactional Record Manager for Shared Flash , 2011, CIDR.

[40]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[41]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[42]  Sudipta Sengupta,et al.  High Performance Transactions in Deuteronomy , 2015, CIDR.

[43]  Gustavo Alonso,et al.  Fast and robust hashing for database operators , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[44]  Michael Schöttner,et al.  Memory management for billions of small objects in a distributed in-memory storage , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[45]  Ling Liu,et al.  Achieving 10Gbps Line-rate Key-value Stores with FPGAs , 2013, HotCloud.

[46]  Pradeep Dubey,et al.  Architecting to achieve a billion requests per second throughput on a single key-value store server platform , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[47]  Hans-Arno Jacobsen,et al.  Flexible Query Processor on FPGAs , 2013, Proc. VLDB Endow..

[48]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[49]  Wei Zhang,et al.  Relational query processing on OpenCL-based FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[50]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[51]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[52]  Behzad Salami,et al.  AxleDB: A novel programmable query processing platform on FPGA , 2017, Microprocess. Microsystems.

[53]  Gustavo Alonso,et al.  Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading , 2014, Proc. VLDB Endow..

[54]  Wei Zhang,et al.  A study of data partitioning on OpenCL-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[55]  Ling Liu,et al.  Scaling Out to a Single-Node 80Gbps Memcached Server with 40Terabytes of Memory , 2015, HotStorage.

[56]  Jens Teubner,et al.  Data Processing on FPGAs , 2013, Proc. VLDB Endow..

[57]  Viktor K. Prasanna,et al.  High Throughput Sketch Based Online Heavy Hitter Detection on FPGA , 2016, SIGARCH Comput. Archit. News.

[58]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[59]  Donald Kossmann,et al.  Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database , 2015, SIGMOD Conference.

[60]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[61]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[62]  Gustavo Alonso,et al.  A Hash Table for Line-Rate Data Processing , 2015, TRETS.

[63]  Gustavo Alonso,et al.  Low-latency TCP/IP stack for data center applications , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[64]  Gustavo Alonso,et al.  Runtime Parameterizable Regular Expression Operators for Databases , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[65]  Gustavo Alonso,et al.  Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[66]  Gustavo Alonso,et al.  Consensus in a Box: Inexpensive Coordination in Hardware , 2016, NSDI.