Evaluating the Success of a Data Analysis

A fundamental problem in the practice and teaching of data science is how to evaluate the quality of a given data analysis, which is different than the evaluation of the science or question underlying the data analysis. Previously, we defined a set of principles for describing data analyses that can be used to create a data analysis and to characterize the variation between data analyses. Here, we introduce a metric of quality evaluation that we call the success of a data analysis, which is different than other potential metrics such as completeness, validity, or honesty. We define a successful data analysis as the matching of principles between the analyst and the audience on which the analysis is developed. In this paper, we propose a statistical model and general framework for evaluating the success of a data analysis. We argue that this framework can be used as a guide for practicing data scientists and students in data science courses for how to build a successful data analysis.

[1]  Chris Chatfield,et al.  Problem Solving: A Statistician's Guide , 1988 .

[2]  Deborah Nolan,et al.  Computing in the Statistics Curricula , 2010 .

[3]  Hadley Wickham,et al.  A Cognitive Interpretation of Data Analysis , 2014 .

[4]  J. Tukey The Future of Data Analysis , 1962 .

[5]  Hilary S. Parker,et al.  Opinionated analysis development , 2017 .

[6]  William S. Cleveland,et al.  Data science: An action plan for expanding the technical areas of the field of statistics , 2001, Stat. Anal. Data Min..

[7]  Johanna S. Hardin,et al.  Teaching the Next Generation of Statistics Students to “Think With Data”: Special Issue on Statistics and the Undergraduate Curriculum , 2015 .

[8]  Deborah F. Swayne,et al.  Interactive and Dynamic Graphics for Data Analysis - With R and GGobi , 2007, Use R.

[9]  M. B. Wilk,et al.  Data analysis and statistics: an expository overview , 1966, AFIPS '66 (Fall).

[10]  D. Donoho 50 Years of Data Science , 2017 .

[11]  C. Wild Embracing the “Wider View” of Statistics , 1994 .

[12]  G. Box Science and Statistics , 1976 .

[13]  Nicholas J. Horton,et al.  Data Science in Statistics Curricula: Preparing Students to “Think with Data” , 2014, 1410.3127.

[14]  Craig,et al.  Corrigendum: Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results , 2018, Advances in Methods and Practices in Psychological Science.

[15]  Rafael A. Irizarry,et al.  A Guide to Teaching Data Science , 2016, The American statistician.

[16]  Daniel T. Kaplan,et al.  Teaching Stats for Data Science , 2018, PeerJ Prepr..

[17]  C. Wild,et al.  Statistical Thinking in Empirical Enquiry , 1999 .

[18]  Karin Baier,et al.  Design Thinking Understanding How Designers Think And Work , 2016 .

[19]  Ben Baumer,et al.  A Data Science Course for Undergraduates: Thinking With Data , 2015, ArXiv.