Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis

Exploring data requires a fast feedback loop from the analyst to the system, with a latency below about 10 seconds because of human cognitive limitations. When data becomes large or analysis becomes complex, sequential computations can no longer be completed in a few seconds and data exploration is severely hampered. This article describes a novel computation paradigm called Progressive Computation for Data Analysis or more concisely Progressive Analytics, that brings at the programming language level a low-latency guarantee by performing computations in a progressive fashion. Moving this progressive computation at the language level relieves the programmer of exploratory data analysis systems from implementing the whole analytics pipeline in a progressive way from scratch, streamlining the implementation of scalable exploratory data analysis systems. This article describes the new paradigm through a prototype implementation called ProgressiVis, and explains the requirements it implies through examples.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.

[3]  Ben Shneiderman,et al.  Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[4]  Ben Shneiderman,et al.  Response time and display rate in human performance with computers , 1984, CSUR.

[5]  Brad A. Myers,et al.  The importance of percent-done progress indicators for computer-human interfaces , 1985, CHI '85.

[6]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[7]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[8]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[9]  Jarke J. van Wijk,et al.  A survey of computational steering environments , 1999, Future Gener. Comput. Syst..

[10]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[11]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[12]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[14]  Tamara Munzner,et al.  Steerable, Progressive Multidimensional Scaling , 2004, IEEE Symposium on Information Visualization.

[15]  David Auber,et al.  Tulip - A Huge Graph Visualization Framework , 2004, Graph Drawing Software.

[16]  Martin L. Kersten,et al.  Breaking the memory wall in MonetDB , 2008, CACM.

[17]  Kwan-Liu Ma,et al.  In Situ Visualization at Extreme Scale: Challenges and Opportunities , 2009, IEEE Computer Graphics and Applications.

[18]  Zhiquan Yeo,et al.  Faster progress bars: manipulating perceived duration with visual augmentations , 2010, CHI.

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[23]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[24]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[25]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[26]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[27]  Monica M. C. Schraefel,et al.  Trust me, i'm partially right: incremental visualization lets analysts explore large datasets faster , 2012, CHI.

[28]  Carlos Eduardo Scheidegger,et al.  Nanocubes for Real-Time Exploration of Spatiotemporal Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[29]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[30]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[31]  N. Diakopoulos,et al.  Data-Driven Rankings : The Design and Development of the IEEE Top Programming Languages News App , 2014 .

[32]  Marc Streit,et al.  Opening the Black Box: Strategies for Increased User Involvement in Existing Algorithm Implementations , 2014, IEEE Transactions on Visualization and Computer Graphics.

[33]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Jaegul Choo,et al.  PIVE: Per-Iteration visualization environment for supporting real-time interactions with computational methods , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[36]  Daniel J. Wigdor,et al.  Dive in!: enabling progressive loading for real-time navigation of data visualizations , 2014, CHI.

[37]  David Gotz,et al.  Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[38]  Nick Pentreath,et al.  Machine Learning with Spark , 2015 .

[39]  Carsten Binnig,et al.  Vizdom: Interactive Analytics through Pen and Touch , 2015, Proc. VLDB Endow..

[40]  Badrish Chandramouli,et al.  Tempe: Live scripting for live data , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[41]  Jean-Daniel Fekete ProgressiVis: a Toolkit for Steerable Progressive Analytics and Visualization , 2015 .

[42]  Prashant J. Shenoy,et al.  Supporting Scalable Analytics with Latency Constraints , 2015, Proc. VLDB Endow..

[43]  Christina Freytag,et al.  Designing And Engineering Time The Psychology Of Time Perception In Software , 2016 .

[44]  Heidrun Schumann,et al.  An Enhanced Visualization Process Model for Incremental Visualization , 2016, IEEE Transactions on Visualization and Computer Graphics.

[45]  Elmar Eisemann,et al.  Approximated and User Steerable tSNE for Progressive Visual Analytics , 2015, IEEE Transactions on Visualization and Computer Graphics.