Design Principles for Data Analysis

The data science revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking – the problem-solving process to understand the people for whom a product is being designed. For a given problem, there can be significant or subtle differences in how a data analyst (or producer of a data analysis) constructs, creates, or designs a data analysis, including differences in the choice of methods, tooling, and workflow. These choices can affect the data analysis products themselves and the experience of the consumer of the data analysis. Therefore, the role of a producer can be thought of as designing the data analysis with a set of design principles. Here, we introduce design principles for data analysis and describe how they can be mapped to data analyses in a quantitative, objective and informative manner. We also provide empirical evidence of variation of principles within and between both producers and consumers of data analyses. Our work leads to two insights: it suggests a formal mechanism to describe data analyses based on the design principles for data analysis, and it provides a framework to teach students how to build data analyses using formal design principles.

[1]  D. Donoho 50 Years of Data Science , 2017 .

[2]  Nigel Cross,et al.  Engineering Design Methods: Strategies for Product Design , 1994 .

[3]  G. Box Science and Statistics , 1976 .

[4]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[5]  R. Viertl On the Future of Data Analysis , 2002 .

[6]  Hadley Wickham,et al.  A Cognitive Interpretation of Data Analysis , 2014 .

[7]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[8]  F. Lipfert,et al.  Commentary on the HEI Reanalysis of the Harvard Six Cities Study and the American Cancer Society Study of Particulate Air Pollution and Mortality , 2003, Journal of toxicology and environmental health. Part A.

[9]  Mathew H. Evans,et al.  Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results , 2018, Advances in Methods and Practices in Psychological Science.

[10]  Beth Chance,et al.  Components of Statistical Thinking and Implications for Instruction and Assessment , 2002 .

[11]  Daniel Krewski,et al.  Reanalysis of the Harvard Six Cities Study and the American Cancer Society Study of Particulate Air , 2000 .

[12]  Itss Odisp Role of Resources , 2003 .

[13]  Ronald D. Snee,et al.  Statistical Thinking and Its Contribution to Total Quality , 1990 .

[14]  Philip J. Guo,et al.  Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges , 2019, CHI.

[15]  Stelios Kafandaris,et al.  Problem Solving: A Statistician's Guide , 1996 .

[16]  M. B. Wilk,et al.  Data analysis and statistics: an expository overview , 1966, AFIPS '66 (Fall).

[17]  R. Peng,et al.  Effect of an Integrated Pest Management Intervention on Asthma Symptoms Among Mouse-Sensitized Children and Adolescents With Asthma: A Randomized Clinical Trial , 2017, JAMA.

[18]  C. Wild,et al.  Statistical Thinking in Empirical Enquiry , 1999 .

[19]  C. Wild Embracing the “Wider View” of Statistics , 1994 .

[20]  Nigel Cross,et al.  Design Thinking: Understanding How Designers Think and Work , 2011 .

[21]  B. Vassilev,et al.  Language-Agnostic Reproducible Data Analysis Using Literate Programming , 2016, PloS one.

[22]  Hilary S. Parker,et al.  Opinionated analysis development , 2017 .

[23]  Nigel Cross,et al.  Engineering Design Methods: Strategies for Product Design (4th ed.) , 2008 .

[24]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[25]  Deborah F. Swayne,et al.  Interactive and Dynamic Graphics for Data Analysis - With R and GGobi , 2007, Use R.