Hardware acceleration of database operations

As the amount of memory in database systems grows, entire database tables, or even databases, are able to fit in the system's memory, making in-memory database operations more prevalent. This shift from disk-based to in-memory database systems has contributed to a move from row-wise to columnar data storage. Furthermore, common database workloads have grown beyond online transaction processing (OLTP) to include online analytical processing and data mining. These workloads analyze huge datasets that are often irregular and not indexed, making traditional database operations like joins much more expensive. In this paper we explore using dedicated hardware to accelerate in-memory database operations. We present hardware to accelerate the selection process of compacting a single column into a linear column of selected data, joining two sorted columns via merging, and sorting a column. Finally, we put these primitives together to accelerate an entire join operation. We implement a prototype of this system using FPGAs and show substantial improvements in both absolute throughput and utilization of memory bandwidth. Using the prototype as a guide, we explore how the hardware resources required by our design change with the desired throughput.

[1]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[2]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[3]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[4]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[5]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Gustavo Alonso,et al.  Streams on Wires - A Query Compiler for FPGAs , 2009, Proc. VLDB Endow..

[7]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[8]  Gustavo Alonso,et al.  Glacier: a query-to-hardware compiler , 2010, SIGMOD Conference.

[9]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[10]  Vitaly Osipov,et al.  GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[12]  Brucek Khailany,et al.  CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Andrew A. Chien,et al.  10x10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency , 2011, ICCS.

[14]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[15]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Jens Teubner,et al.  Data Processing on FPGAs , 2013, Proc. VLDB Endow..

[17]  Pierre Sens,et al.  Stream Processing of Healthcare Sensor Data: Studying User Traces to Identify Challenges from a Big Data Perspective , 2015, ANT/SEIT.