论文信息 - SparkCruise: Handsfree Computation Reuse in Spark

SparkCruise: Handsfree Computation Reuse in Spark

Interactive data analytics is often inundated with common computations across multiple queries. These redundancies result in poor query performance and higher overall cost for the interactive query sessions. Obviously, reusing these common computations could lead to cost savings. However, it is difficult for the users to manually detect and reuse the common computations in their fast moving interactive sessions. In the paper, we propose to demonstrate SparkCruise, a computation reuse system that automatically selects the most useful common computations to materialize based on the past query workload. SparkCruise materializes these computations as part of query processing, so the users can continue with their query processing just as before and computation reuse is automatically applied in the background — all without any modifications to the Spark code. We will invite the audience to play with several scenarios, such as workload redundancy insights and pay-as-you-go materialization, highlighting the utility of SparkCruise. PVLDB Reference Format: Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino. SparkCruise: Handsfree Computation Reuse in Spark. PVLDB, 12(12): 1850-1853, 2019. DOI: https://doi.org/10.14778/3352063.3352082

[1] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[2] Hiren Patel,et al. Computation Reuse in Analytics Job Service at Microsoft , 2018, SIGMOD Conference.

[3] Pietro Michiardi,et al. In-memory Caching for Multi-query Optimization of Data-intensive Scalable Computing Workloads , 2019, EDBT/ICDT Workshops.

[4] Hiren Patel,et al. Selecting Subexpressions to Materialize at Datacenter Scale , 2018, Proc. VLDB Endow..