Shared Execution Techniques for Business Data Analytics over Big Data Streams

Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm. In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.

[1]  Sang-goo Lee,et al.  Efficient query processing on distributed stream processing engine , 2017, IMCOM.

[2]  Johannes Gehrke,et al.  Rule-based multi-query optimization , 2009, EDBT '09.

[3]  Joseph M. Hellerstein,et al.  The Case for Precision Sharing , 2004, VLDB.

[4]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[5]  Rada Chirkova,et al.  Answering queries using materialized views with minimum size , 2005, The VLDB Journal.

[6]  Ion Stoica,et al.  Sharing aggregate computation for distributed queries , 2007, SIGMOD '07.

[7]  Lukasz Golab,et al.  ViewDF: Declarative incremental view maintenance for streaming data , 2017, Inf. Syst..

[8]  Andreas Behrend,et al.  Optimizing continuous queries using update propagation with varying granularities , 2015, SSDBM.

[9]  Bin Song,et al.  Kodiak: Leveraging Materialized Views For Very Low-Latency Analytics Over High-Dimensional Web-Scale Data , 2016, Proc. VLDB Endow..

[10]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[11]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[12]  Abhishek Chandra,et al.  Multi-Query Optimization in Wide-Area Streaming Analytics , 2018, SoCC.

[13]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[14]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[15]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[16]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[17]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.