How to Automatically Document Data With the codebook Package to Facilitate Data Reuse

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.

[1]  L. R. Goldberg A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models , 1999 .

[2]  Michael W. Browne,et al.  Psychometrics , 2000, The SAGE Encyclopedia of Research Design.

[3]  Karsten Boye Rasmussen,et al.  The data documentation initiative: a preservation standard for research , 2007 .

[4]  Rik Crutzen,et al.  Time is a jailer: What do alpha and its alternatives tell us about reliability? , 2007 .

[5]  W. Revelle,et al.  Individual Differences in Cognition: New Methods for Examining the Personality-Cognition Link , 2010 .

[6]  Gjalt-Jorn Peters,et al.  The alpha and the omega of scale reliability and validity: Why and how to abandon Cronbach’s alpha and the route towards more comprehensive assessment of scale quality , 2014 .

[7]  Jeroen Ooms,et al.  The OpenCPU System: Towards a Universal Interface for Scientific Computing through Separation of Concerns , 2014, ArXiv.

[8]  D. Mroczek,et al.  Future Directions in the Study of Personality in Adulthood and Older Age , 2015, Gerontology.

[9]  Frederick L. Oswald,et al.  Cloud-based Meta-analysis to Bridge Science and Practice: Welcome to metaBUS , 2015 .

[10]  Robert Lanfear,et al.  Public Data Archiving in Ecology and Evolution: How Well Are We Doing? , 2015, PLoS biology.

[11]  Yihui Xie,et al.  A Wrapper of the JavaScript Library 'DataTables' , 2015 .

[12]  Herman Aguinis,et al.  HARKing's Threat to Organizational Research: Evidence From Primary and Meta‐Analytic Sources , 2016 .

[13]  Satrajit S. Ghosh,et al.  The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments , 2016, Scientific Data.

[14]  Susann Fiedler,et al.  Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency , 2016, PLoS biology.

[15]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[16]  William Revelle,et al.  Web- and Phone-Based Data Collection Using Planned Missing Designs , 2017 .

[17]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[18]  Ruben C. Arslan Cook codebooks from survey metadata encoded in attributes in R (Version v0.1.0) [Computer software] , 2017 .

[19]  R. Crutzen,et al.  Scale quality: alpha is an inadequate estimate and factor-analytic evidence is needed first of all , 2017, Health psychology review.

[20]  Michael C. Frank,et al.  Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition , 2018, Royal Society Open Science.

[21]  Daniel M. McNeish,et al.  Psychological Methods Thanks Coefficient Alpha , We ’ ll Take It From Here , 2022 .

[22]  David Stanley,et al.  Reproducible Tables in Psychology Using the apaTables Package , 2018, Advances in Methods and Practices in Psychological Science.

[23]  D. Lüdecke Sjplot - Data Visualization For Statistics In Social Science. , 2018 .

[24]  Ahmed Hosny,et al.  The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards , 2018, Data Protection and Privacy.

[25]  J. Borghi,et al.  Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers , 2018, bioRxiv.

[26]  Ruben C. Arslan,et al.  formr: A study framework allowing for automated feedback generation and complex longitudinal experience-sampling studies using R , 2018, Behavior Research Methods.