DataSense: Display-Agnostic Data Documentation

Documentation of data is critical for understanding the semantics of data, understanding how data was created, and for raising awareness of data quality problem, errors, and assumptions. However, manually creating, maintaining, and exploring documentation is time consuming and error prone. In this work, we present our vision for display-agnostic data documentation (DAD), a novel data management paradigm that aids users in dealing with documentation for data. We introduce DataSense, a system implementing the DAD paradigm. Specifically, DataSense supports multiple types of documentation from free form text to structured information like provenance and uncertainty annotations, as well as several display formats for documentation. DataSense automatically computes documentation for derived data. A user study we conducted with uncertainty documentation produced by DataSense demonstrates the benefits of documentation management.

[1]  S. Joslyn,et al.  Decisions With Uncertainty: The Glass Half Full , 2013 .

[2]  Ben Shneiderman,et al.  Interactive pattern search in time series , 2005, IS&T/SPIE Electronic Imaging.

[3]  Mohamed Y. Eltabakh,et al.  InsightNotes: summary-based annotation management in relational databases , 2014, SIGMOD Conference.

[4]  Wang Chiew Tan,et al.  DBNotes: a post-it system for relational databases based on provenance , 2005, SIGMOD '05.

[5]  Boris Glavic,et al.  Approximate summaries for why and why-not provenance , 2020, Proc. VLDB Endow..

[6]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[7]  Tim Kraska,et al.  Sherlock: A Deep Learning Approach to Semantic Data Type Detection , 2019, KDD.

[8]  Boris Glavic,et al.  Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers , 2019, SIGMOD Conference.

[9]  Heike Hofmann,et al.  Interactive Graphics for Data Sets with Missing Values—MANET , 1996 .

[10]  Felix Naumann,et al.  Data profiling , 2017, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[11]  Silvia Miksch,et al.  Visual support for rastering of unequally spaced time series , 2017, VINCI.

[12]  Dan Suciu,et al.  Explaining Query Answers with Explanation-Ready Databases , 2015, Proc. VLDB Endow..

[13]  Margaret Varga,et al.  Black Holes, Keyholes And Brown Worms: Challenges In Sense Making , 2012 .

[14]  Klemens Böhm,et al.  Informative Summarization of Numeric Data , 2019, SSDBM.

[15]  Daniel Deutch,et al.  Approximated Summarization of Data Provenance , 2015, CIKM.

[16]  Martin Steinert,et al.  Displayed Uncertainty Improves Driving Experience and Behavior: The Case of Range Anxiety in an Electric Car , 2015, CHI.

[17]  Juliana Freire,et al.  Your notebook is not crumby enough, REPLace it , 2020, CIDR.

[19]  Walid G. Aref,et al.  Supporting annotations on relations , 2009, EDBT '09.

[20]  Bernt Schiele,et al.  Evaluating the Effects of Displaying Uncertainty in Context-Aware Applications , 2004, UbiComp.

[21]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[22]  Floris Geerts,et al.  MONDRIAN: Annotating and Querying Databases through Colors and Blocks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Egor V. Kostylev,et al.  Combining dependent annotations for relational algebra , 2012, ICDT '12.

[24]  Robert C. Glen,et al.  Visual analysis of missing data — To see what isn't there , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[25]  Kai Zhao,et al.  Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips , 2016, IEEE Data Eng. Bull..