Principles and procedures of exploratory data analysis.

Exploratory data analysis (EDA) is a well-established statistical tradition that provides conceptual and computational tools for discovering patterns to foster hypothesis development and refinement. These tools and attitudes complement the use of significance and hypothesis tests used in confirmatory data analysis (CDA). Although EDA complements rather than replaces CDA, use of CDA without EDA is seldom warranted. Even when well-specified theories are held, EDA helps one interpret the results of CDA and may reveal unexpected or misleading patterns in the data. This article introduces the central heuristics and computational tools of EDA and contrasts it with CDA and exploratory statistics in general. EDA techniques are illustrated using previously published psychological data. Changes in statistical training and practice are recommended to incorporate these tools.

[1]  William J. Thompson,et al.  The collected works of john w. tukey , 1991 .

[2]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[3]  John W. Tukey,et al.  We Need Both Exploratory and Confirmatory , 1980 .

[4]  Kenneth R. Paap,et al.  The case of the vanishing frequency effect: A retest of the verification model. , 1994 .

[5]  Rick L. Edgeman,et al.  LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics , 1992 .

[6]  R. Rosenthal,et al.  Statistical versus traditional procedures for summarizing research findings. , 1980, Psychological bulletin.

[7]  Samuel Leinhardt,et al.  Exploratory Data Analysis: New Tools for the Analysis of Empirical Data , 1980 .

[8]  Vincent P. Barabba,et al.  Through a Glass Less Darkly , 1991 .

[9]  Anthony C. Atkinson,et al.  Plots, transformations, and regression : an introduction to graphical methods of diagnostic regression analysis , 1987 .

[10]  W. Kintsch,et al.  Memory and cognition , 1977 .

[11]  D. Hoaglin,et al.  Fine-Tuning Some Resistant Rules for Outlier Labeling , 1987 .

[12]  John J. Bertin,et al.  The semiology of graphics , 1983 .

[13]  Cyril Burt,et al.  INTELLIGENCE AND SOCIAL MOBILITY , 1961 .

[14]  F. Mosteller,et al.  The Education of a Scientific Generalist. , 1949, Science.

[15]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[16]  Jacob Cohen The earth is round (p < .05) , 1994 .

[17]  John M. Chambers,et al.  Graphical Methods for Data Analysis , 1983 .

[18]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[19]  R. Hanka The Handbook of Research Synthesis , 1994 .

[20]  John Carson,et al.  Constructing the subject: historical origins of psychological research , 1996, Medical History.

[21]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[22]  David C. Hoaglin,et al.  Applications, basics, and computing of exploratory data analysis , 1983 .

[23]  Robert L. Winkler,et al.  Bayesian statistics: An overview. , 1993 .

[24]  S. Kosslyn Understanding charts and graphs , 1989 .

[25]  T. C. Chamberlin The Method of Multiple Working Hypotheses: With this method the dangers of parental affection for a favorite theory can be circumvented. , 1965, Science.

[26]  S. Lewandowsky,et al.  Discriminating strata in scatterplots , 1989 .

[27]  J. Tukey Data analysis, computation and mathematics , 1972 .

[28]  Daniel Bernoulli,et al.  The most probable choice between several discrepant observations and the formation therefrom of the most likely induction , 1961 .

[29]  Maya Bar-Hillel,et al.  How to Solve Probability Teasers , 1989, Philosophy of Science.

[30]  E. S. Pearson,et al.  ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .

[31]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[32]  Gideon Keren,et al.  A Handbook for Data Analysis in the Behavioral Sciences: Statistical Issues , 1993 .

[33]  John B. Willett,et al.  The visual presentation and interpretation of meta-analyses. , 1994 .

[34]  S. Mulaik Exploratory Statistics and Empiricism , 1985, Philosophy of Science.

[35]  W. Beveridge The Art Of Scientific Investigation , 1957 .

[36]  James B. Ramsey,et al.  Evaluation of Econometric Models , 1980 .

[37]  Larry V. Hedges,et al.  Cooper, Harris, and Larry V. Hedges, eds., The Handbook of Research Synthesis. New York: Russell Sage Foundation, 1994. , 1994 .

[38]  Knight Dunlap The average animal. , 1935 .

[39]  Frederick Mosteller,et al.  Broadening the Scope of Statistics and Statistical Education , 1988 .

[40]  Gerald Zaltman Management principles for nonprofit agencies and organizations , 1979 .

[41]  P. D. Finch,et al.  Description and analogy in the practice of statistics , 1979 .

[42]  John T. Behrens,et al.  Box, Line, and Midgap Plots: Effects of Display Characteristics on the Accuracy and Bias of Estimates of Whisker Length , 1991 .

[43]  Joel B. Greenhouse,et al.  Sensitivity analysis and diagnostics. , 1994 .

[44]  John W. Tukey INTRODUCTION TO STYLES OF DATA ANALYSIS TECHNIQUES , 1982 .

[45]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[46]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[47]  G. Glass,et al.  Meta-analysis in social research , 1981 .

[48]  Stephan Lewandowsky,et al.  The Perception of Statistical Graphs , 1989 .

[49]  J M Bland,et al.  Transforming data. , 1996, BMJ.

[50]  D. S. Sivia,et al.  Data Analysis , 1996, Encyclopedia of Evolutionary Psychological Science.

[51]  Edwin G. Boring,et al.  Mathematical vs. scientific significance. , 1919 .

[52]  Alan E. Kazdin,et al.  Graduate Training in Statistics, Methodology, and Measurement in Psychology: A Survey of PhD Programs in North America , 1990 .

[53]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[54]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[55]  H. V. Henderson,et al.  Building Multiple Regression Models Interactively , 1981 .

[56]  Leonard R. Sussman,et al.  Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[57]  Edward Rolf Tufte,et al.  The visual display of quantitative information , 1985 .

[58]  W. Cleveland The Collected Works of John W. Tukey, Volume V, Graphics 1965-1985. , 1989 .

[59]  John T. Behrens,et al.  Judgment Errors in Elementary Box-Plot Displays , 1990 .

[60]  Ralph L. Rosnow Paradigms in Transition: The Methodology of Social Inquiry , 1981 .

[61]  John W. Tukey,et al.  Philosophy and principles of data analysis , 1986 .

[62]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[63]  Samuel Leinhardt,et al.  Exploratory Data Analysis: An Introduction to Selected Methods , 1979 .

[64]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[65]  Lawrence Mayer,et al.  The Use of Exploratory Methods in Ecomonic Analysis: Analyzing Residential Energy Demand , 1980 .

[66]  Michael Stuart,et al.  Understanding Robust and Exploratory Data Analysis , 1984 .

[67]  Ronald N. Giere,et al.  Understanding Scientific Reasoning , 1979 .

[68]  Beat Kleiner,et al.  Graphical Methods for Data Analysis , 1983 .

[69]  Peter Urbach,et al.  Scientific Reasoning: The Bayesian Approach , 1989 .

[70]  David K. Simkin,et al.  An Information-Processing Analysis of Graph Perception , 1987 .

[71]  G. Gigerenzer From Tools to Theories: A Heuristic of Discovery in Cognitive Psychology. , 1991 .

[72]  John C. Lind,et al.  The Continuity Principle in Psychological Research: An Introduction to Robust Statistics , 1993 .

[73]  A. Feingold,et al.  Gender differences in personality: a meta-analysis. , 1994, Psychological bulletin.

[74]  Karl G. Jöreskog,et al.  Lisrel 8: Structural Equation Modeling With the Simplis Command Language , 1993 .

[75]  M. E. McGill,et al.  Dynamic Graphics for Statistics , 1988 .

[76]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[77]  M. Bar-Hillel,et al.  Some teasers concerning conditional probabilities , 1982, Cognition.

[78]  Edward R. Tufte,et al.  Envisioning Information , 1990 .

[79]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[80]  Josef Schmee,et al.  Outliers in Statistical Data (2nd ed.) , 1986 .

[81]  Philip J. Lauver,et al.  Factors associated with perceived career options in American Indian, White, and Hispanic rural high school students. , 1991 .

[82]  Gideon Keren,et al.  A Handbook for data analysis in the behavioral sciences : methodological issues , 1993 .

[83]  Robert W. Lent,et al.  Career self-efficacy: Empirical status and future directions , 1987 .

[84]  A. Goldman An Introduction to Regression Graphics , 1995 .

[85]  H. Wainer,et al.  Speed vs reaction time as a measure of cognitive performance , 1977, Memory & cognition.

[86]  Stephen Dubin How many subjects? Statistical power analysis in research , 1990 .

[87]  John T. Behrens,et al.  Applications of multivariate visualization to behavioral sciences , 1995 .

[88]  J. Titchener Experimenter Effects in Behavioral Research. , 1967 .

[89]  G. Glass Primary, Secondary, and Meta-Analysis of Research1 , 1976 .

[90]  C. Howson,et al.  Scientific Reasoning: The Bayesian Approach , 1989 .

[91]  Herbert A. Simon,et al.  Does Scientific Discovery Have a Logic , 1973 .

[92]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[93]  Norman Schofield,et al.  Data analysis and the social sciences , 1984 .

[94]  Johann Jacob Baeyer,et al.  Gradmessung in Ostpreussen und ihre Verbindung mit Preussischen und Russischen Dreiecksketten , 1838 .

[95]  Colin L. Mallows,et al.  Robust Methods—Some Examples of Their Use , 1979 .

[96]  Robert Rosenthal,et al.  The Interpretation of Levels of Significance by Psychological Researchers , 1963 .

[97]  T. C. Chamberlin The Method of Multiple Working Hypotheses , 1931, The Journal of Geology.

[98]  John W. Tukey,et al.  Analyzing data: Sanctification or detective work? , 1969 .

[99]  William J. McGuire,et al.  Psychology of science: A perspectivist approach to the strategic planning of programmatic scientific research , 1989 .

[100]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data , 1992 .

[101]  Robert Rosenthal,et al.  Further Evidence for the Cliff Effect in the Interpretation of Levels of Significance , 1964 .

[102]  G. Box Science and Statistics , 1976 .

[103]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[104]  H. Wainer Robust Statistics: A Survey and Some Prescriptions , 1976 .

[105]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[106]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[107]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[108]  Frederick Mosteller,et al.  Exploring Data Tables, Trends and Shapes. , 1986 .

[109]  M. B. Wilk,et al.  Data analysis and statistics: an expository overview , 1966, AFIPS '66 (Fall).

[110]  John W. Tukey,et al.  Methodology, and the Statistician's Responsibility for BOTH Accuracy AND Relevance , 1979 .

[111]  William R. Shadish,et al.  Psychology of science : contributions to metascience , 1989 .

[112]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[113]  A. Madansky Identification of Outliers , 1988 .

[114]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[115]  I. J. Good,et al.  The Philosophy of Exploratory Data Analysis , 1983, Philosophy of Science.

[116]  Robert S. Cohen,et al.  On Scientific Discovery , 1981 .

[117]  John T. Behrens,et al.  Data and data analysis. , 1996 .

[118]  Samuel Leinhardt,et al.  Chapter 3: Exploratory Data Analysis: New Tools for the Analysis of Empirical Data , 1980 .

[119]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data , 1991 .