Unravel: A Fluent Code Explorer for Data Wrangling

Data scientists have adopted a popular design pattern in programming called the fluent interface for composing data wrangling code. The fluent interface works by combining multiple transformations on a data table—or dataframes—with a single chain of expressions, which produces an output. Although fluent code promotes legibility, the intermediate dataframes are lost, forcing data scientists to unravel the chain through tedious code edits and re-execution. Existing tools for data scientists do not allow easy exploration or support understanding of fluent code. To address this gap, we designed a tool called Unravel that enables structural edits via drag-and-drop and toggle switch interactions to help data scientists explore and understand fluent code. Data scientists can apply simple structural edits via drag-and-drop and toggle switch interactions to reorder and (un)comment lines. To help data scientists understand fluent code, Unravel provides function summaries and always-on visualizations highlighting important changes to a dataframe. We discuss the design motivations behind Unravel and how it helps understand and explore fluent code. In a first-use study with 14 data scientists, we found that Unravel facilitated diverse activities such as validating assumptions about the code or data, exploring alternatives, and revealing function behavior.

[1]  Brad A. Myers,et al.  Designing the whyline: a debugging interface for asking questions about program behavior , 2004, CHI.

[2]  Sorin Lerner,et al.  Small-Step Live Programming by Example , 2020, UIST.

[3]  Robert DeLine,et al.  Fork It: Supporting Stateful Alternatives in Computational Notebooks , 2021, CHI.

[4]  Sorin Lerner,et al.  Projection Boxes: On-the-fly Reconfigurable Visualization for Live Programming , 2020, CHI.

[5]  Michael D. Ernst,et al.  Interactive record/replay for web application debugging , 2013, UIST.

[6]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[7]  Holger Stitz,et al.  TACO: Visualizing Changes in Tables Over Time , 2018, IEEE Transactions on Visualization and Computer Graphics.

[8]  Rob Miller,et al.  Addressing misconceptions about code with always-on programming visualizations , 2014, CHI.

[9]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[10]  Emerson R. Murphy-Hill,et al.  From Quick Fixes to Slow Fixes: Reimagining Static Analysis Resolutions to Enable Design Space Exploration , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[11]  Steven M. Drucker,et al.  Managing Messes in Computational Notebooks , 2019, CHI.

[12]  Michael J. Muller,et al.  How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation , 2019, CHI.

[13]  Beth Simon,et al.  Evaluating a new exam question: Parsons problems , 2008, ICER '08.

[14]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[15]  Ralph E. Johnson,et al.  Drag-and-drop refactoring: Intuitive and efficient program transformation , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Brad A. Myers,et al.  Variolite: Supporting Exploratory Programming by Data Scientists , 2017, CHI.

[17]  Titus Barik,et al.  TweakIt: Supporting End-User Programmers Who Transmogrify Code , 2021, CHI.

[18]  Dominik Moritz,et al.  mage: Fluid Moves Between Code and Graphical Work in Computational Notebooks , 2020, UIST.

[19]  Sumit Gulwani,et al.  Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists , 2020, CHI.

[20]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[21]  Daniel G. Goldstein,et al.  Datamations: Animated Explanations of Data Analysis Pipelines , 2021, CHI.