Query Processing on Tensor Computation Runtimes

The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. Experiments show that TQP can improve query execution time by up to 10X over specialized CPU- and GPU-only systems. Finally, TQP can accelerate queries mixing ML predictions and SQL end-to-end, and deliver up to 9X speedup over CPU baselines.

[1]  Matteo Interlandi,et al.  Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem , 2022, Proc. VLDB Endow..

[2]  Konstantinos Karanasos,et al.  End-to-end Optimization of Machine Learning Prediction Queries , 2022, SIGMOD Conference.

[3]  V. Markl,et al.  Query Processing on Heterogeneous CPU/GPU Systems , 2022, ACM Comput. Surv..

[4]  Hung-Wei Tseng,et al.  TCUDB: Accelerating Database with Tensor Processors , 2021, SIGMOD Conference.

[5]  Floris Geerts,et al.  Matrix Query Languages , 2021, SIGMOD Rec..

[6]  Phillip B. Gibbons,et al.  The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding , 2021, MLSys.

[7]  Xiaoyong Du,et al.  FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures , 2021, 2021 IEEE International Conference on Cluster Computing (CLUSTER).

[8]  Bingsheng He,et al.  Database Systems on GPUs , 2021, Found. Trends Databases.

[9]  Rubao Lee,et al.  The Art of Balance: A RateupDB Experience of Building a CPU/GPU Hybrid Database Product , 2021, Proc. VLDB Endow..

[10]  Jana Giceva,et al.  Database Technology for the Masses: Sub-Operators as First-Class Entities , 2021, Proc. VLDB Endow..

[11]  Todd C. Mowry,et al.  Filter Representation in Vectorized Query Execution , 2021, DaMoN.

[12]  R. Appuswamy,et al.  XJoin: Portable, parallel hash join across diverse XPU architectures with oneAPI , 2021, DaMoN.

[13]  Francesco Guerra,et al.  Transforming ML Predictive Pipelines into SQL with MASQ , 2021, SIGMOD Conference.

[14]  John M Harner Metal , 2021, Profiting from the Peak: Landscape and Liberty in Colorado Springs.

[15]  Tianqi Chen,et al.  Cortex: A Compiler for Recursive Deep Learning Models , 2020, MLSys.

[16]  Carlo Curino,et al.  A Tensor Compiler for Unified Machine Learning Prediction Serving , 2020, OSDI.

[17]  Chiew Tong Lau,et al.  Improving Execution Efficiency of Just-in-time Compilation based Query Processing on GPUs , 2020, Proc. VLDB Endow..

[18]  C. Jermaine,et al.  Tensor Relational Algebra for Distributed Machine Learning System Design , 2020, Proc. VLDB Endow..

[19]  Tilmann Rabl,et al.  Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects , 2020, SIGMOD Conference.

[20]  Gustavo Alonso,et al.  Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms , 2020, Proc. VLDB Endow..

[21]  Samuel Madden,et al.  A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics , 2020, SIGMOD Conference.

[22]  Uday Bondhugula,et al.  MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.

[23]  Dan Suciu,et al.  SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra , 2020, Proc. VLDB Endow..

[24]  Hannes Mühleisen,et al.  Data Management for Data Science - Towards Embedded Analytics , 2020, CIDR.

[25]  Aditya G. Parameswaran,et al.  Towards scalable dataframe systems , 2020, Proc. VLDB Endow..

[26]  Carlo Curino,et al.  Extending Relational Query Processing with ML Inference , 2019, CIDR.

[27]  Alexander Aiken,et al.  TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.

[28]  Stefanie N. Lindstaedt,et al.  SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle , 2019, CIDR.

[29]  Hannes Mühleisen,et al.  Relational Queries with a Tensor Processing Unit , 2019, DaMoN.

[30]  Kenneth A. Ross,et al.  Towards Practical Vectorized Analytical Query Engines , 2019, DaMoN.

[31]  Tin Vu,et al.  Deep Query Optimization , 2019, SIGMOD Conference.

[32]  Chris Jermaine,et al.  Declarative Recursive Computation on an RDBMS , 2019, Proc. VLDB Endow..

[33]  Byung-Gon Chun,et al.  JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs , 2018, NSDI.

[34]  Viktor Leis,et al.  Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask , 2018, Proc. VLDB Endow..

[35]  Jens Teubner,et al.  Pipelined Query Processing in Coprocessor Environments , 2018, SIGMOD Conference.

[36]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[37]  Daniel Lemire,et al.  Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources , 2018, SIGMOD Conference.

[38]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[39]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[40]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[41]  Sergei Vassilvitskii,et al.  SQML: large-scale in-database machine learning with pure SQL , 2017, SoCC.

[42]  Tilmann Rabl,et al.  Generating custom code for efficient query execution on heterogeneous processors , 2017, The VLDB Journal.

[43]  Todd C. Mowry,et al.  Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last , 2017, Proc. VLDB Endow..

[44]  Bin Cui,et al.  MLog: Towards Declarative In-Database Machine Learning , 2017, Proc. VLDB Endow..

[45]  Carlo Zaniolo,et al.  Fixpoint semantics and optimization of recursive Datalog programs with aggregates* , 2017, Theory and Practice of Logic Programming.

[46]  Jun Yang,et al.  Data Management in Machine Learning: Challenges, Techniques, and Systems , 2017, SIGMOD Conference.

[47]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[48]  Thomas N. Theis,et al.  The End of Moore's Law: A New Beginning for Information Technology , 2017, Computing in Science & Engineering.

[49]  Samuel Madden,et al.  Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware , 2016, Proc. VLDB Endow..

[50]  Shirish Tatikonda,et al.  SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..

[51]  Bingsheng He,et al.  GPL: A GPU-based Pipelined Query Processing Engine , 2016, SIGMOD Conference.

[52]  Amir Shaikhha,et al.  How to Architect a Query Compiler , 2016, SIGMOD Conference.

[53]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[54]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[55]  Jignesh M. Patel,et al.  Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries , 2015, DaMoN.

[56]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[57]  Meichun Hsu,et al.  Large-scale Predictive Analytics in Vertica: Fast Data Transfer, Distributed Model Creation, and In-database Prediction , 2015, SIGMOD Conference.

[58]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[59]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[60]  K. Selçuk Candan,et al.  TensorDB: In-Database Tensor Manipulation with Tensor-Relational Query Plans , 2014, CIKM.

[61]  Wolfgang Lehner,et al.  SLACID - sparse linear algebra in a column-oriented in-memory database system , 2014, SSDBM '14.

[62]  Viktor Leis,et al.  Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[63]  S. Madden,et al.  The Design and Implementation of Modern Column-Oriented Database Systems , 2013, Found. Trends Databases.

[64]  Yuan Yu,et al.  Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[65]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[66]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[67]  Volker Markl,et al.  Hardware-Oblivious Parallelism for In-Memory Column-Stores , 2013, Proc. VLDB Endow..

[68]  Bogdan Raducanu,et al.  Micro adaptivity in Vectorwise , 2013, SIGMOD '13.

[69]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[70]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[71]  Christopher Ré,et al.  Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.

[72]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[73]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[74]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[75]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[76]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[77]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[78]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[79]  Kenneth A. Ross,et al.  Fast joins using join indices , 1999, The VLDB Journal.

[80]  Keshav Pingali,et al.  A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.

[81]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[82]  Izajasz P. Wrosz,et al.  DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines , 2022, CIDR.

[83]  Shoumik Palkar,et al.  Photon: A High-Performance Query Engine for the Lakehouse , 2022, CIDR.

[84]  Gustavo Alonso,et al.  Tensors: An abstraction for general data processing , 2021, Proc. VLDB Endow..

[85]  Phanwadee Sinthong A Retargetable Query-based Approach to Scaling Dataframes , 2021 .

[86]  Thomas Neumann,et al.  Umbra: A Disk-Based System with In-Memory Performance , 2020, CIDR.

[87]  Matei Zaharia,et al.  Optimizing DNN Computation with Relaxed Graph Substitutions , 2019, MLSys.

[88]  Anastasia Ailamaki,et al.  HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines , 2019, Proc. VLDB Endow..

[89]  Transaction Processing Performance Council , 2019, Encyclopedia of Big Data Technologies.

[90]  Stephan Günnemann,et al.  ML2SQL - Compiling a Declarative Machine Learning Language to SQL and Python , 2019, EDBT.

[91]  Stephan Günnemann,et al.  In-Database Machine Learning: Gradient Descent and Tensor Algebra for Main Memory Database Systems , 2019, BTW.

[92]  Chris Jermaine,et al.  Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning , 2019, ArXiv.

[93]  Ziyang Feng albert , 2018, The Cleveland Heights LGBTQ Sci-Fi and Fantasy Role Playing Club.

[94]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[95]  Surajit Chaudhuri,et al.  Integration of Data Mining and Relational Databases , 2000 .