NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews

Spreadsheet systems are by far the most popular platform for data exploration on the planet, supporting millions of rows of data. However, exploring spreadsheets that are this large via operations such as scrolling or issuing formulae can be overwhelming and error-prone. Users easily lose context and suffer from cognitive and mechanical burdens while issuing formulae on data spanning multiple screens. To address these challenges, we introduce dynamic hierarchical overviews that are embedded alongside spreadsheets. Users can employ this overview to explore the data at various granularities, zooming in and out of the spreadsheet. They can issue formulae over data subsets without cumbersome scrolling or range selection, enabling users to gain a high or low-level perspective of the spreadsheet. An implementation of our dynamic hierarchical overview, NOAH, integrated within DataSpread, preserves spreadsheet semantics and look and feel, while introducing such enhancements. Our user studies demonstrate that NOAH makes it more intuitive, easier, and faster to navigate spreadsheet data compared to traditional spreadsheets like Microsoft Excel and spreadsheet plug-ins like Pivot Table, for a variety of exploration tasks; participants made fewer mistakes in NOAH while being faster in completing the tasks. PVLDB Reference Format: Sajjadur Rahman, Mangesh Bendre, Yuyang Liu, Shichu Zhu, Zhaoyuan Su, Karrie Karahalios, and Aditya G. Parameswaran. NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews. PVLDB, 14(6): 970 983, 2021. doi:10.14778/3447689.3447701 PVLDB Artifact Availability: The source code, data, and/or other artifacts have been made available at https://github.com/dataspread/NOAH. ∗This work began when these authors were part of the University of Illinois. This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 14, No. 6 ISSN 2150-8097. doi:10.14778/3447689.3447701

[1]  John Riedl,et al.  A spreadsheet approach to information visualization , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[2]  Tamara Munzner,et al.  A Multi-Level Typology of Abstract Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3]  Kanit Wongsuphasawat,et al.  Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations , 2016, IEEE Transactions on Visualization and Computer Graphics.

[4]  Allison Woodruff,et al.  Guidelines for using multiple views in information visualization , 2000, AVI '00.

[5]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..

[6]  Carlo Zaniolo,et al.  Fast and accurate computation of equi-depth histograms over data streams , 2011, EDBT/ICDT '11.

[7]  Marcos K. Aguilera,et al.  Hillview: A trillion-cell spreadsheet for big data , 2019, Proc. VLDB Endow..

[8]  Tova Milo,et al.  Next-Step Suggestions for Modern Interactive Data Analysis Platforms , 2018, KDD.

[9]  Abraham Silberschatz,et al.  DataPlay: interactive tweaking and example-driven correction of graphical database queries , 2012, UIST.

[10]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[11]  H. V. Jagadish,et al.  QueryVis: Logic-based Diagrams help Users Understand Complicated SQL Queries Faster , 2020, SIGMOD Conference.

[12]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[13]  David R. Karger,et al.  A spreadsheet-based user interface for managing plural relationships in structured data , 2011, CHI.

[14]  Niklas Elmqvist,et al.  Keshif: Rapid and Expressive Tabular Data Exploration for Novices , 2018, IEEE Transactions on Visualization and Computer Graphics.

[15]  Kevin Chen-Chuan Chang,et al.  Characterizing Scalability Issues in Spreadsheet Software using Online Forums , 2018, CHI Extended Abstracts.

[16]  Zhe Chen,et al.  Integrating spreadsheet data via accurate and low-effort extraction , 2014, KDD.

[17]  Stephen G. Powell,et al.  A comparison of spreadsheet users with different levels of experience , 2009 .

[18]  Zhe Chen,et al.  Automatic web spreadsheet data extraction , 2013, SS@ '13.

[19]  Eirik Bakke,et al.  Expressive Query Construction through Direct Manipulation of Nested Relational Results , 2016, SIGMOD Conference.

[20]  Andreas Paepcke,et al.  PhotoSpread: A Spreadsheet for Managing Photos , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  J.C. Roberts,et al.  State of the Art: Coordinated & Multiple Views in Exploratory Visualization , 2007, Fifth International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV 2007).

[22]  Dan Suciu,et al.  A Case for A Collaborative Query Management System , 2009, CIDR.

[23]  Jerzy Tyszkiewicz Spreadsheet as a relational database engine , 2010, SIGMOD Conference.

[24]  Sumit Gulwani,et al.  FlashRelate: extracting relational data from semi-structured spreadsheets using examples , 2015, PLDI.

[25]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[26]  Zhe Chen,et al.  Senbazuru: A Prototype Spreadsheet Database Management System , 2013, Proc. VLDB Endow..

[27]  Kevin Chen-Chuan Chang,et al.  Faster, Higher, Stronger: Redesigning Spreadsheets for Scale , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[28]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[29]  John T. Stasko,et al.  Interactive Browsing and Navigation in Relational Databases , 2016, Proc. VLDB Endow..

[30]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[31]  Edwin Blake,et al.  ViSSh: A Data Visualisation Spreadsheet , 2000, VisSym.

[32]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[33]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[34]  Arnab Nandi,et al.  Gestural Query Specification , 2013, Proc. VLDB Endow..

[35]  Marc Levoy,et al.  Spreadsheets for images , 1994, SIGGRAPH.

[36]  Kurt W. Piersol Object Oriented Spreadsheets: The Analytic Spreadsheet Package , 1986, OOPSLA.

[37]  Jeffrey Heer,et al.  imMens: Real‐time Visual Querying of Big Data , 2013, Comput. Graph. Forum.

[38]  Benjamin B. Bederson,et al.  A review of overview+detail, zooming, and focus+context interfaces , 2009, CSUR.

[39]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[40]  Kelly Mack,et al.  Benchmarking Spreadsheet Systems , 2020, SIGMOD Conference.

[41]  Andy Chou,et al.  Scalable Spreadsheets for Interactive Data Analysis , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[42]  Jonathan Grudin,et al.  Partitioning digital worlds: focal and peripheral awareness in multiple monitor use , 2001, CHI.

[43]  Raymond R. Panko,et al.  What we know about spreadsheet errors , 1998 .

[44]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[45]  Guoliang Li,et al.  Interactive SQL query suggestion: Making databases user-friendly , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[46]  Dongmei Zhang,et al.  TableSense: Spreadsheet Table Detection with Convolutional Neural Networks , 2019, AAAI.

[47]  Joseph M. Hellerstein,et al.  Online Dynamic Reordering for Interactive Data Processing , 1999, VLDB.

[48]  John T. Stasko,et al.  Tasks for Multivariate Network Analysis , 2013, Multivariate Network Visualization.

[49]  Jeffrey Heer,et al.  Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked Visualizations , 2019, CHI.

[50]  Tamara Munzner,et al.  TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text , 2016, IEEE Transactions on Visualization and Computer Graphics.

[51]  Veda C. Storey,et al.  The use of spreadsheets in organizations: Determinants and consequences , 1996, Inf. Manag..

[52]  Kelly Mack,et al.  Anti-Freeze for Large and Complex Spreadsheets: Asynchronous Formula Computation , 2019, SIGMOD Conference.

[53]  Zhe Chen,et al.  Spreadsheet Property Detection With Rule-assisted Active Learning , 2017, CIKM.

[54]  Elmar Eisemann,et al.  Cytosplore: Interactive Immune Cell Phenotyping for Large Single‐Cell Datasets , 2016, Comput. Graph. Forum.

[55]  Stratos Idreos,et al.  dbTouch: Analytics at your Fingertips , 2013, CIDR.

[56]  Robert S. Laramee,et al.  Survey of Surveys (SoS) ‐ Mapping The Landscape of Survey Papers in Information Visualization , 2017, Comput. Graph. Forum.

[57]  Tamara Munzner,et al.  The nested blocks and guidelines model , 2015, Inf. Vis..

[58]  Bonnie A. Nardi,et al.  The spreadsheet interface: A basis for end user programming , 1990, IFIP TC13 International Conference on Human-Computer Interaction.

[59]  David D. Woods,et al.  How Experienced Users Avoid Getting Lost in Large Display Networks , 1999, Int. J. Hum. Comput. Interact..

[60]  Ken Perlin,et al.  Pad: an alternative approach to the computer interface , 1993, SIGGRAPH.

[61]  Kevin Chen-Chuan Chang,et al.  Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[62]  Aditya G. Parameswaran,et al.  Understanding Data Analysis Workflows on Spreadsheets: Roadblocks and Opportunities , 2020 .

[63]  D. Bradbard,et al.  Spreadsheet usage by management accountants: An exploratory study , 2014 .