Accelerating database workloads by software-hardware-system co-design

The key objective of this tutorial is to provide a broad, yet an in-depth survey of the emerging field of co-designing software, hardware, and systems components for accelerating enterprise data management workloads. The overall goal of this tutorial is two-fold. First, we provide a concise system-level characterization of different types of data management technologies, namely, the relational and NoSQL databases and data stream management systems from the perspective of analytical workloads. Using the characterization, we discuss opportunities for accelerating key data management workloads using software and hardware approaches. Second, we dive deeper into the hardware acceleration opportunities using Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) for the query execution pipeline. Furthermore, we explore other hardware acceleration mechanisms such as single-instruction multiple-data (SIMD) that enables short-vector data parallelism.

[1]  Hans-Arno Jacobsen,et al.  Flexible Query Processor on FPGAs , 2013, Proc. VLDB Endow..

[2]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[3]  Gustavo Alonso,et al.  Sorting networks on FPGAs , 2012, The VLDB Journal.

[4]  Hans-Arno Jacobsen,et al.  Towards highly parallel event processing through reconfigurable hardware , 2011, DaMoN '11.

[5]  Sebastian Breß,et al.  Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS , 2013, Proc. VLDB Endow..

[6]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[7]  Ray T. Chen,et al.  An optical centralized shared-bus architecture demonstrator for microprocessor-to-memory interconnects , 2003 .

[8]  Hans-Arno Jacobsen,et al.  Multi-query Stream Processing on FPGAs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[9]  Hideyuki Kawashima,et al.  A fast handshake join implementation on FPGA with adaptive merging network , 2013, SSDBM.

[10]  Jason Yang,et al.  Symmetric Key Cryptography on Modern Graphics Hardware , 2007, ASIACRYPT.

[11]  Kenneth A. Ross,et al.  Vectorized Bloom filters for advanced SIMD processors , 2014, DaMoN '14.

[12]  Vassilis J. Tsotras,et al.  Accelerating XML Query Matching through Custom Stack Generation on FPGAs , 2010, HiPEAC.

[13]  Dominique Lavenier,et al.  Evaluation of the streams-C C-to-FPGA compiler: an applications perspective , 2001, FPGA '01.

[14]  Weng-Fai Wong,et al.  A computing origami: Folding streams in FPGAs , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  Kenneth A. Ross,et al.  Ameliorating memory contention of OLAP operators on GPU processors , 2012, DaMoN '12.

[16]  Hans-Arno Jacobsen,et al.  Efficient event processing through reconfigurable hardware for algorithmic trading , 2010, Proc. VLDB Endow..

[17]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[18]  Hans-Arno Jacobsen,et al.  Configurable hardware-based streaming architecture using Online Programmable-Blocks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[19]  Jens Teubner,et al.  FPGAs: a new point in the database design space , 2010, EDBT '10.

[20]  Gustavo Alonso,et al.  Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading , 2014, Proc. VLDB Endow..

[21]  Gustavo Alonso,et al.  Streams on Wires - A Query Compiler for FPGAs , 2009, Proc. VLDB Endow..

[22]  Toshio Nakatani,et al.  AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[23]  Vassilis J. Tsotras,et al.  Massively parallel XML twig filtering using dynamic programming on FPGAs , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[24]  Hans-Arno Jacobsen,et al.  fpga-ToPSS: line-speed event processing on fpgas , 2011, DEBS '11.

[25]  Stamatis Vassiliadis,et al.  Synthesis of Regular Expressions Targeting FPGAs: Current Status and Open Issues , 2007, ARC.

[26]  Bharat Sukhwani,et al.  Database analytics acceleration using FPGAs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Yuan Yuan,et al.  Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores , 2015, Proc. VLDB Endow..

[28]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[29]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[30]  Martin C. Herbordt,et al.  Achieving High Performance with FPGA-Based Computing , 2007, Computer.

[31]  Wayne Luk,et al.  FPGA Accelerated Low-Latency Market Data Feed Processing , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[32]  Sunil Shukla,et al.  FPGA-based combined architecture for stream categorization and intrusion detection , 2010, Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010).

[33]  I. Mandal A low-power content-addressable memory (CAM) using pipelined search scheme , 2010, ICWET.

[34]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[35]  Jens Teubner,et al.  Data Processing on FPGAs , 2013, Proc. VLDB Endow..

[36]  Jens Teubner,et al.  Skeleton automata for FPGAs: reconfiguring without reconstructing , 2012, SIGMOD Conference.

[37]  Alessandro Margara,et al.  High performance content-based matching using GPUs , 2011, DEBS '11.

[38]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[39]  Mohammad Sadoghi Hamedani An Efficient, Extensible, Hardware-aware Indexing Kernel , 2014 .

[40]  Kenneth A. Ross,et al.  The Q100 Database Processing Unit , 2015, IEEE Micro.

[41]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[42]  Kenneth A. Ross,et al.  High throughput heavy hitter aggregation for modern SIMD processors , 2013, DaMoN '13.

[43]  Vassilis J. Tsotras,et al.  Efficient XML Path Filtering Using GPUs , 2011, ADMS@VLDB.

[44]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[45]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[46]  Behzad Salami,et al.  HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[47]  Rolf Ernst,et al.  FlexWAFE - A High-end Real-Time Stream Processing Library for FPGAs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[48]  Hans-Arno Jacobsen,et al.  Adaptive parallel compressed event matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[49]  Bingsheng He,et al.  OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures , 2013, Proc. VLDB Endow..

[50]  Gustavo Alonso,et al.  Glacier: a query-to-hardware compiler , 2010, SIGMOD Conference.

[51]  Jim Tørresen,et al.  FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting , 2011, FPGA '11.

[52]  Antonino Tumeo,et al.  Efficient pattern matching on GPUs for intrusion detection systems , 2010, CF '10.

[53]  Martin C. Herbordt,et al.  Computing Models for FPGA-Based Accelerators , 2008, Computing in Science & Engineering.

[54]  Rajesh Bordawekar,et al.  Analyzing Analytics , 2015, Analyzing Analytics.

[55]  Gustavo Alonso,et al.  FPGA acceleration for the frequent item problem , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[56]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[57]  Bingsheng He,et al.  GPUQP: query co-processing using graphics processors , 2007, SIGMOD '07.

[58]  Jens Teubner,et al.  Low-Latency Handshake Join , 2014, Proc. VLDB Endow..

[59]  Vassilis J. Tsotras,et al.  A study on parallelizing XML path filtering using accelerators , 2014, ACM Trans. Embed. Comput. Syst..

[60]  Gustavo Alonso,et al.  Less watts, more performance: an intelligent storage engine for data appliances , 2013, SIGMOD '13.

[61]  Petko Bakalov,et al.  Boosting XML Filtering with a Scalable FPGA-based Architecture , 2009, CIDR 2009.

[62]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[63]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[64]  John W. Lockwood,et al.  A Low-Latency Library in FPGA Hardware for High-Frequency Trading (HFT) , 2012, 2012 IEEE 20th Annual Symposium on High-Performance Interconnects.

[65]  Hans-Arno Jacobsen,et al.  The FQP Vision: Flexible Query Processing on a Reconfigurable Computing Fabric , 2015, SGMD.

[66]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[67]  W. Paul,et al.  Computer Architecture , 2000, Springer Berlin Heidelberg.

[68]  Petko Bakalov,et al.  Boosting XML filtering through a scalable FPGA-based architecture , 2009, CIDR.

[69]  Kenneth A. Ross,et al.  Optimizing select conditions on GPUs , 2013, DaMoN '13.

[70]  Jens Teubner,et al.  How soccer players would do stream joins , 2011, SIGMOD '11.

[71]  Gustavo Alonso,et al.  Complex event detection at wire speed with FPGAs , 2010, Proc. VLDB Endow..

[72]  Philip S. Yu,et al.  CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.