SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics

Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired trends in data can be laborious. We propose SeeDB, a visualization recommendation engine to facilitate fast visual analysis: given a subset of data to be studied, SeeDB intelligently explores the space of visualizations, evaluates promising visualizations for trends, and recommends those it deems most “useful” or “interesting”. The two major obstacles in recommending interesting visualizations are (a) scale: evaluating a large number of candidate visualizations while responding within interactive time scales, and (b) utility: identifying an appropriate metric for assessing interestingness of visualizations. For the former, SeeDB introduces pruning optimizations to quickly identify high-utility visualizations and sharing optimizations to maximize sharing of computation across visualizations. For the latter, as a first step, we adopt a deviation-based metric for visualization utility, while indicating how we may be able to generalize it to other factors influencing utility. We implement SeeDB as a middleware layer that can run on top of any DBMS. Our experiments show that our framework can identify interesting visualizations with high accuracy. Our optimizations lead to multiple orders of magnitude speedup on relational row and column stores and provide recommendations at interactive time scales. Finally, we demonstrate via a user study the effectiveness of our deviation-based utility metric and the value of recommendations in supporting visual analytics.

[1]  Aditya G. Parameswaran,et al.  SeeDB: visualizing database queries efficiently , 2013, VLDB 2013.

[2]  Jeffrey D. Ullman,et al.  Worst-Case Performance Bounds for Simple One-Dimensional Packing Algorithms , 1974, SIAM J. Comput..

[3]  Sunita Sarawagi,et al.  Intelligent Rollups in Multidimensional OLAP Data , 2001, VLDB.

[4]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[5]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[6]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[8]  Christopher Ahlberg,et al.  Spotfire: an information exploration environment , 1996, SGMD.

[9]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[10]  Pat Hanrahan,et al.  Polaris: a system for query, analysis, and visualization of multidimensional databases , 2008, Commun. ACM.

[11]  Christian S. Jensen,et al.  Google fusion tables: web-centered data management and collaboration , 2010, SIGMOD Conference.

[12]  Kanit Wongsuphasawat,et al.  Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations , 2016, IEEE Transactions on Visualization and Computer Graphics.

[13]  Carlos Ordonez,et al.  Exploration and visualization of OLAP cubes with statistical tests , 2009, VAKD '09.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[16]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[17]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  Daniel Perry,et al.  VizDeck: self-organizing dashboards for visual analytics , 2012, SIGMOD Conference.

[20]  Magdalena Balazinska,et al.  Support the Data Enthusiast: Challenges for Next-Generation Data-Analysis Systems , 2014, Proc. VLDB Endow..

[21]  Jeffrey Heer,et al.  Profiler: integrated statistical analysis and visualization for data quality assessment , 2012, AVI.

[22]  R. Varshney,et al.  Supporting top-k join queries in relational databases , 2011 .

[23]  Phillip M. Fernandez Red brick warehouse: a read-mostly RDBMS for open SMP platforms , 1994, SIGMOD '94.

[24]  M.O. Ward,et al.  Prefetching for visual data exploration , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[25]  Ronitt Rubinfeld,et al.  Rapid Sampling for Visualizations with Ordering Guarantees , 2014, Proc. VLDB Endow..

[26]  Patrick Marcel,et al.  A survey of query recommendation techniques for data warehouse exploration , 2011, EDA.

[27]  Sunita Sarawagi,et al.  Explaining Differences in Multidimensional Aggregates , 1999, VLDB.

[28]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[29]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[30]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[31]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[32]  Sunita Sarawagi,et al.  User-Adaptive Exploration of Multidimensional Data , 2000, VLDB.

[33]  Aditya G. Parameswaran,et al.  SEEDB: Automatically Generating Query Visualizations , 2014, Proc. VLDB Endow..

[34]  Jan Polowinski,et al.  VISO: a shared, formal knowledge base as a foundation for semi-automatic infovis systems , 2013, CHI Extended Abstracts.

[35]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[36]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[37]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.