On Improving User Response Times in Tableau

The rapid increase in data volumes and complexity of applied analytical tasks poses a big challenge for visualization solutions. It is important to keep the experience highly interactive, so that users stay engaged and can perform insightful data exploration. Query processing usually dominates the cost of visualization generation. Therefore, in order to achieve acceptable response times, one needs to utilize backend capabilities to the fullest and apply techniques, such as caching or prefetching. In this paper we discuss key data processing components in Tableau: the query processor, query caches, Tableau Data Engine [1, 2] and Data Server. Furthermore, we cover recent performance improvements related to the number and quality of remote queries, broader reuse of cached data, and application of inter and intra query parallelism.

[1]  Parag Agrawal,et al.  Scheduling shared scans of large data files , 2008, Proc. VLDB Endow..

[2]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[3]  Mila E. Majster-Cederbaum,et al.  Elimination of redundant operations in relational queries with general selection operators , 1985, Computing.

[4]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[5]  Peter Boncz,et al.  UvA-DARE ( Digital Academic Repository ) Monet ; a next-Generation DBMS Kernel For Query-Intensive Applications , 2007 .

[6]  Alfred V. Aho,et al.  Efficient optimization of a class of relational expressions , 1978, SIGMOD Conference.

[7]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[8]  Martin L. Kersten,et al.  An architecture for recycling intermediates in a column-store , 2009, SIGMOD Conference.

[9]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[10]  Surajit Chaudhuri,et al.  On the complexity of equivalence between recursive and nonrecursive Datalog programs , 1994, PODS '94.

[11]  S. Sudarshan,et al.  Pipelining in multi-query optimization , 2001, PODS '01.

[12]  Jaideep Srivastava,et al.  Multiple query optimization with Depth-First Branch-and-Bound and dynamic query ordering , 1993, CIKM '93.

[13]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[14]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[15]  Pawel Terlecki,et al.  Leveraging compression in the tableau data engine , 2014, SIGMOD Conference.

[16]  Z. Meral Özsoyoglu,et al.  On Efficient Reasoning with Implication Constraints , 1993, DOOD.

[17]  Phokion G. Kolaitis,et al.  On the complexity of the containment problem for conjunctive queries with built-in predicates , 1998, PODS '98.

[18]  Nikolaus Ott,et al.  Removing redundant join operations in queries involving views , 1985, Inf. Syst..

[19]  Chris Stolte,et al.  Dynamic workload driven data integration in tableau , 2012, SIGMOD Conference.

[20]  Marcin Zukowski,et al.  Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..

[21]  Alfred V. Aho,et al.  The theory of joins in relational data bases , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[22]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[23]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[24]  Amr El Abbadi,et al.  Multiple query optimization in middleware using query teamwork , 2005, Softw. Pract. Exp..

[25]  Surajit Chaudhuri,et al.  On the equivalence of recursive and nonrecursive datalog programs , 1992, J. Comput. Syst. Sci..

[26]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[27]  Arie Segev,et al.  Using common subexpressions to optimize multiple queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[28]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[29]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[30]  Werner Nutt,et al.  Containment of Aggregate Queries , 2003, ICDT.

[31]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[32]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[33]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[34]  Arnab Nandi,et al.  Combining User Interaction, Speculative Query Execution and Sampling in the DICE System , 2014, Proc. VLDB Endow..

[35]  Margaret H. Dunham,et al.  Common Subexpression Processing in Multiple-Query Processing , 1998, IEEE Trans. Knowl. Data Eng..

[36]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[37]  Pat Hanrahan,et al.  Polaris: a system for query, analysis, and visualization of multidimensional databases , 2008, Commun. ACM.

[38]  Alfred V. Aho,et al.  Efficient optimization of a class of relational expressions , 1979, TODS.

[39]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[40]  Pawel Terlecki,et al.  An analytic data engine for visualization in tableau , 2011, SIGMOD '11.

[41]  Herodotos Herodotou,et al.  Massively Parallel Databases and MapReduce Systems , 2013, Found. Trends Databases.

[42]  Hongjun Lu,et al.  Workload Scheduling for Multiple Query Processing , 1995, Inf. Process. Lett..

[43]  Matthias Jarke,et al.  Common Subexpression Isolation in Multiple Query Optimization , 1984, Query Processing in Database Systems.

[44]  Jingren Zhou,et al.  Incorporating partitioning and parallel plans into the SCOPE optimizer , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[45]  Hongjun Lu,et al.  Scheduling Multiple Queries in Symmetric Multiprocessors , 1996, Inf. Sci..