A Step-by-Step Guide to Using Secondary Data for Psychological Research

The purpose of this paper is to serve as a primer for those who have never used, or even considered using, secondary data as a resource for psychological research. Secondary data (SD) can provide a unique methodological tool with which to examine psychological issues and can serve as a valuable contribution to a program of research. However, this important resource may often be overlooked because its use can sometimes appear daunting and time-consuming. We seek to assist new users of SD by describing the process in a step-by-step manner. We address both benefits and challenges to anticipate when using SD, and discuss identifying and acquiring potential datasets, creating a personalized dataset, variable creation, statistical considerations, and the potential problem of conflicting findings when large datasets are used by multiple researchers. Our goal is to encourage researchers who are novices to the approach to consider using SD as an adjunct to their program of research. Secondary data (SD) can provide a unique methodological resource in which to examine psychological issues. Exploring secondary data is often done in concert with other methods, such as experimental and clinical research, to provide a well-rounded examination of a psychological construct or phenomena. However, this important resource may be overlooked because its use can sometimes appear daunting and time-consuming. The purpose of this paper is to draw awareness to the use of secondary data as a valuable adjunct in a program of research, and to serve as a primer to researchers who have not previously used this methodology. Our goal is to demystify the use of secondary data by describing the process in a step-by-step manner. SD can be useful for researchers at any career stage, including graduate students searching for data for a thesis topic, junior faculty looking to augment data used in their research program, or senior researchers seeking pilot data for grant applications. Although the terms archival and secondary data are sometimes used interchangeably in the literature, they are defined differently. Archival data come from examination of primary source documents such as letters, newspaper articles, or school or medical records (see, e.g., Wicke & Silver, 2009). This often requires the complex and time-consuming process of tracking down original records and transcribing these documents to create a workable dataset. The term secondary data refers to data that have been collected and made available by a primary source. Secondary data are often collected for a specific purpose but can also be used to address questions in other fields of research. In addition, general repositories of data exist to aid researchers with factual statistics about a population of interest.

[1]  Daniel R. Feenberg,et al.  Improving the Accessibility of the Nber's Historical Data , 1995 .

[2]  Dean P. Jones,et al.  Association between posttraumatic stress disorder and inflammation: A twin study , 2013, Brain, Behavior, and Immunity.

[3]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[4]  T. Hughes,et al.  Drinking and drinking-related problems among heterosexual and sexual minority women. , 2008, Journal of studies on alcohol and drugs.

[5]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[6]  W. Eaton,et al.  A prospective study of posttraumatic stress disorder symptoms and coronary heart disease in women. , 2009, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[7]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[8]  B. Dohrenwend,et al.  The Psychological Risks of Vietnam for U.S. Veterans: A Revisit with New Data and Methods , 2006, Science.

[9]  Brady T. West,et al.  Linear Mixed Models: A Practical Guide Using Statistical Software , 2006 .

[10]  G. Gmel,et al.  Gender and alcohol consumption: patterns from the multinational GENACIS project. , 2009, Addiction.

[11]  S. Wilsnack,et al.  Are U.S. women drinking less (or more)? Historical and aging trends, 1981-2001. , 2006, Journal of studies on alcohol.

[12]  Wayne T. Steward,et al.  The impact of universal access to antiretroviral therapy on HIV stigma in Botswana. , 2008, American journal of public health.

[13]  P. Ouimette,et al.  Association Between Posttraumatic Stress Disorder and Primary Care Provider-Diagnosed Disease Among Iraq and Afghanistan Veterans , 2010, Psychosomatic medicine.

[14]  J. Talbott Physical and Mental Health Costs of Traumatic War Experiences Among Civil War Veterans , 2007 .

[15]  D. Dooley,et al.  Age of Alcohol Drinking Onset: Precursors and the Mediation of Alcohol Disorder , 2006 .

[16]  Matthew E. Kahn,et al.  Heroes and Cowards: The Social Face of War , 2008 .

[17]  S. Wilsnack,et al.  Ten-year prediction of women's drinking behavior in a nationally representative sample. , 1998, Women's health.

[18]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[19]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update , 2005 .

[20]  C. Tomlinson-Keasey,et al.  Opportunities and challenges posed by archival data sets. , 1993 .

[21]  R. C. Silver,et al.  A Community Responds to Collective Trauma: An Ecological Analysis of the James Byrd Murder in Jasper, Texas , 2009, American journal of community psychology.

[22]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[23]  Janet B W Williams,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[24]  Charles E. McCulloch,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models , 2005 .

[25]  Virginia Gil-Rivas,et al.  Terrorism, acute stress, and cardiovascular health: a 3-year national study following the September 11th attacks. , 2008, Archives of general psychiatry.

[26]  G. Mikhail,et al.  Coronary heart disease in women , 2005, BMJ : British Medical Journal.

[27]  Keith F. Widaman,et al.  Studying Lives Through Time: Personality and Development , 1993 .

[28]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[29]  R. Carney,et al.  Data management and accountability in behavioral and biomedical research. , 1992, The American psychologist.

[30]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.