Exploratory data analysis as a foundation of inductive research

Abstract Across academic disciplines, scientific progress is maximized when there is a balance between deductive and inductive approaches. To promote this balance in organizational science, rigorous inductive research aimed at phenomenon detection must be further encouraged. To this end, the present article discusses the logic and methods of exploratory data analysis (EDA), the mode of analysis concerned with discovery, exploration, and empirically detecting phenomena in data. We begin by first describing the historical and conceptual background of EDA. We then discuss two issues related to EDA and its relationship to scientific credibility. First, we argue that EDA fosters a replication-based science by requiring cross-validation and by emphasizing the natural uncertainty of data patterns. Second, we clarify that EDA is distinguishable from other exploratory practices that are considered scientifically questionable (e.g., “ p -hacking”, “data fishing” and “data-dredging”). In the following section of the paper, we present a final argument for EDA: that it helps maximize the value of data. To illustrate this point, we present several graphical methods for detecting data patterns and provide references to further techniques for the interested reader.

[1]  Larry V Hedges,et al.  The power of statistical tests for moderators in meta-analysis. , 2004, Psychological methods.

[2]  Andrew Gelman,et al.  Interrogating p-values , 2013 .

[3]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[4]  Timothy R. Hinkin A Brief Tutorial on the Development of Measures for Use in Survey Questionnaires , 1998 .

[5]  Herman Aguinis,et al.  Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling , 2013 .

[6]  Samuel Leinhardt,et al.  Exploratory Data Analysis: New Tools for the Analysis of Empirical Data , 1980 .

[7]  Antony Unwin,et al.  Graphical Data Analysis with R , 2018 .

[8]  John B. Willett,et al.  Improving the Teaching of Applied Statistics: Putting the Data Back into Data Analysis , 1990 .

[9]  Kristen E. DiCerbo,et al.  Exploratory Data Analysis , 2003 .

[10]  James W. Neuliep,et al.  Editorial bias against replication research. , 1990 .

[11]  Brian A. Nosek,et al.  Registered Reports A Method to Increase the Credibility of Published Results , 2014 .

[12]  D. Hambrick THE FIELD OF MANAGEMENT'S DEVOTION TO THEORY: TOO MUCH OF A GOOD THING? , 2007 .

[13]  G. Banks,et al.  The Chrysalis Effect , 2017 .

[14]  Michael T Braun,et al.  Exploratory regression analysis: A tool for selecting models and determining predictor importance , 2011, Behavior research methods.

[15]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[16]  A. Gelman The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research , 2015 .

[17]  Nathan T. Carter,et al.  Uncovering curvilinear relationships between conscientiousness and job performance: how theoretically appropriate measurement makes an empirical difference. , 2014, The Journal of applied psychology.

[18]  Matthew C. Makel,et al.  Replications in Psychology Research , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[19]  James W. Neuliep,et al.  Reviewer bias against replication research. , 1993 .

[20]  Paul E. Spector,et al.  Moving the Pendulum Back to the Middle: Reflections on and Introduction to the Inductive Research Special Issue of Journal of Business and Psychology , 2014 .

[21]  Simon Urbanek,et al.  Interactive graphics for Data Analysis - Principles and Examples , 2008, Computer science and data analysis series.

[22]  Han L. J. van der Maas,et al.  Science Perspectives on Psychological an Agenda for Purely Confirmatory Research on Behalf Of: Association for Psychological Science , 2022 .

[23]  John T. Behrens,et al.  Principles and procedures of exploratory data analysis. , 1997 .

[24]  Glenn J. Myatt Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining , 2006 .

[25]  Herman Aguinis,et al.  HARKing's Threat to Organizational Research: Evidence From Primary and Meta‐Analytic Sources , 2016 .

[26]  Herman Aguinis,et al.  Best-practice recommendations for estimating interaction effects using moderated multiple regression , 2010 .

[27]  J. Wicherts,et al.  The Rules of the Game Called Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[28]  D. Lakens,et al.  Rewarding Replications , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[29]  T. Brown,et al.  Confirmatory Factor Analysis for Applied Research , 2006 .

[30]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[31]  Sven Kepes,et al.  How Trustworthy Is the Scientific Literature in Industrial and Organizational Psychology? , 2013, Industrial and Organizational Psychology.

[32]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[33]  S. West,et al.  A comparison of methods to test mediation and other intervening variable effects. , 2002, Psychological methods.

[34]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[35]  H. Pashler,et al.  Editors’ Introduction to the Special Section on Replicability in Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[36]  Barry M. Staw,et al.  What Theory is Not , 1995 .

[37]  Chandler Stolp,et al.  The Visual Display of Quantitative Information , 1983 .

[38]  Uri Simonsohn,et al.  Posterior-Hacking: Selective Reporting Invalidates Bayesian Results Also , 2014 .

[39]  J. Birch,et al.  Interactive Data Analysis , 1978 .

[40]  Rex B. Kline,et al.  Principles and Practice of Structural Equation Modeling , 1998 .

[41]  John W. Tukey,et al.  We Need Both Exploratory and Confirmatory , 1980 .

[42]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[43]  John W. Tukey,et al.  Exploratory Data Analysis , 1980, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[44]  John W Tukey,et al.  Exploratory Data Analysis: Past, Present and Future , 1993 .

[45]  F. Hartwig,et al.  Exploratory Data Analysis , 2008, Using Science in Cybersecurity.

[46]  Brad J. Sagarin,et al.  An Ethical Approach to Peeking at Data , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[47]  James P. Shaver,et al.  What Statistical Significance Testing Is, and What It Is Not , 1993 .

[48]  Graham J. Williams,et al.  Data Mining with Rattle and R , 2013 .

[49]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[50]  D. Eden Replication, Meta-Analysis, Scientific Progress, and AMJ's Publication Policy , 2002 .

[51]  J. Mathieu,et al.  Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. , 2012, The Journal of applied psychology.

[52]  Samuel Leinhardt,et al.  Exploratory Data Analysis: An Introduction to Selected Methods , 1979 .

[53]  D. Allen,et al.  Toward an Inductive Theory of Stayers and Seekers in the Organization , 2014 .

[54]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[55]  J. Bain,et al.  PSYCHOLOGICAL SCIENCE Research Article How Many Variables Can Humans Process? , 2022 .

[56]  R. H. Stumpf,et al.  Graphical exploratory data analysis , 1986 .

[57]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[58]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[59]  Encourage Playing with Data and Discourage Questionable Reporting Practices , 2015, Psychometrika.

[60]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[61]  D. Sharpe Why the resistance to statistical innovations? Bridging the communication gap. , 2013, Psychological methods.

[62]  Denny Borsboom,et al.  A Skeptical Eye on Psi , 2015 .

[63]  S. Phillips,et al.  Processing capacity defined by relational complexity: implications for comparative, developmental, and cognitive psychology. , 1998, The Behavioral and brain sciences.

[64]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[65]  Frederick Mosteller,et al.  Data Analysis and Regression , 1978 .

[66]  C. Judd,et al.  When moderation is mediated and mediation is moderated. , 2005, Journal of personality and social psychology.

[67]  Paulo Cortez,et al.  Data Mining with , 2005 .

[68]  N. Kerr HARKing: Hypothesizing After the Results are Known , 1998, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[69]  Andrew T. Jebb,et al.  A Closer Look at the Personality-Turnover Relationship , 2016 .

[70]  B. Haig An abductive theory of scientific method. , 2005, Psychological methods.