Augmenting Decision Making via Interactive What-If Analysis

The fundamental goal of business data analysis is to improve business decisions using data. Business users such as sales, marketing, product, or operations managers often make decisions to achieve key performance indicator (KPI) goals such as increasing customer retention, decreasing cost, and increasing sales. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses, considering multitudes of combinations and scenarios, slicing, dicing, and transforming the data accordingly. For example, analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions whose effectiveness remains unclear or fail to cater to business users. Here we argue for four functionalities that we believe are necessary to enable business users to interactively learn and reason about the relationships (functions) between sets of data attributes, facilitating data-driven decision making. We implement these functionalities in SYSTEMD, an interactive visual data analysis system enabling business users to experiment with the data by asking what-if questions. *Also with University of Maryland. †Also with University of Amsterdam. We evaluate the system through three business use cases: marketing mix modeling analysis, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Overall, business users find SYSTEMD intuitive and useful for quick testing and validation of their hypotheses around interested KPI as well as in making effective and fast data-driven decisions.

[1]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[2]  R. Viertl On the Future of Data Analysis , 2002 .

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Paul P. Maglio,et al.  Data is dead... without what-if models , 2011, Proc. VLDB Endow..

[5]  Babak Salimi,et al.  Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals , 2021, SIGMOD Conference.

[6]  Peter J. Haas,et al.  Stochastic Package Queries in Probabilistic Databases , 2020, SIGMOD Conference.

[7]  Jessica Hullman,et al.  Designing for Interactive Exploratory Data Analysis Requires Theories of Graphical Inference , 2021, Harvard Data Science Review.

[8]  P. Pirolli,et al.  The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis , 2015 .

[9]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[10]  L. Shapley,et al.  The Shapley Value , 1994 .

[11]  Dik Lun Lee,et al.  iForest: Interpreting Random Forests via Visual Analytics , 2019, IEEE Transactions on Visualization and Computer Graphics.

[12]  Marco Cavallo,et al.  A Visual Interaction Framework for Dimensionality Reduction Based Data Exploration , 2018, CHI.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[15]  Huamin Qu,et al.  DECE: Decision Explorer with Counterfactual Explanations for Machine Learning Models , 2020, IEEE Transactions on Visualization and Computer Graphics.

[16]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[17]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[18]  Peter J. Haas,et al.  Simulation of database-valued markov chains using SimSQL , 2013, SIGMOD '13.

[19]  Tim Kraska,et al.  Davos: A System for Interactive Data-Driven Decision Making , 2021, Proc. VLDB Endow..

[20]  Enrico Bertini,et al.  ViCE: visual counterfactual explanations for machine learning models , 2020, IUI.

[21]  Jeffrey Heer,et al.  Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[22]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[23]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[24]  Marco Russo,et al.  Introducing Microsoft Power BI , 2016 .