The Case for Network Accelerated Query Processing

The fastest plans in MPP databases are usually those with the least amount of data movement across nodes, as data is not processed while in transit. The network switches that connect MPP nodes are hard-wired to perform packetforwarding logic only. However, in a recent paradigm shift, network devices are becoming “programmable.” The quotes here are cautionary. Switches are not becoming general purpose computers (just yet). But now the set of tasks they can perform can be encoded in software. In this paper we explore this programmability to accelerate OLAP queries. We determined that we can offload onto the switch some very common and expensive query patterns. Thus, for the first time, moving data through networking equipment can contribute to query execution. Our preliminary results show that we can improve response times on even the best agreed upon plans by more than 2x using 25 Gbps networks. We also see the promise of linear performance improvement with faster speeds. The use of programmable switches can open new possibilities of architecting rackand datacenter-sized database systems, with implications across the stack.

[1]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[2]  Carsten Binnig,et al.  The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[3]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[4]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[5]  Fernando M. V. Ramos,et al.  Software-Defined Networking: A Comprehensive Survey , 2014, Proceedings of the IEEE.

[6]  George Varghese,et al.  Design principles for packet parsers , 2013, Architectures for Networking and Communications Systems.

[7]  Jialin Li,et al.  Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control , 2017, SOSP.

[8]  Jeffrey F. Naughton,et al.  A software-defined networking based approach for performance management of analytical queries on distributed data stores , 2014, SIGMOD Conference.

[9]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[10]  Philip Levis,et al.  RPL: IPv6 Routing Protocol for Low-Power and Lossy Networks , 2012, RFC.

[11]  Nick McKeown,et al.  Algorithms for packet classification , 2001, IEEE Netw..

[12]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[13]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[14]  Carsten Binnig,et al.  Boosting scalable data analytics with modern programmable networks , 2018, DaMoN.

[15]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[16]  George Varghese,et al.  Compiling Packet Programs to Reconfigurable Switches , 2015, NSDI.

[17]  Walter Willinger,et al.  Sonata: query-driven streaming network telemetry , 2018, SIGCOMM.

[18]  Lawrence Kreeger,et al.  Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks , 2014, RFC.

[19]  Francis Zane,et al.  Coolcams: power-efficient TCAMs for forwarding engines , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[20]  Ariel Orda,et al.  dRMT: Disaggregated Programmable Switching , 2017, SIGCOMM.

[21]  Carsten Binnig,et al.  Rethinking Distributed Query Execution on High-Speed Networks , 2017, IEEE Data Eng. Bull..

[22]  David J. DeWitt,et al.  Query optimization in microsoft SQL server PDW , 2012, SIGMOD Conference.

[23]  Anirudh Sivaraman,et al.  Language-Directed Hardware Design for Network Performance Monitoring , 2017, SIGCOMM.