Simple strategies for improving inference with linked data: a case study of the 1850–1930 IPUMS linked representative historical samples

Abstract New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850–1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)—a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.

[1]  Hoyt Bleakley,et al.  Disease and Development: Evidence from Hookworm Eradication in the American South. , 2007, The quarterly journal of economics.

[2]  Alan L. Olmstead,et al.  Arresting Contagion: Science, Policy, and Conflicts over Animal Disease Control , 2015 .

[3]  R. Steckel Census Matching and Migration: A Research Strategy , 1988 .

[4]  Grant Miller,et al.  The role of public health improvements in health advances: The twentieth-century United States , 2005, Demography.

[5]  Donald N. McCloskey The Trouble with Mathematics and Statistics in Economics. , 2005 .

[6]  J. Ferrie,et al.  Up from Poverty? The 1832 Cherokee Land Lottery and the Long-Run Distribution of Wealth , 2013 .

[7]  A. Aizer,et al.  The Long-Run Impact of Cash Transfers to Poor Families. , 2016, The American economic review.

[8]  Roy Mill,et al.  Race, Skin Color, and Economic Outcomes in Early Twentieth-Century America , 2016 .

[9]  Jason Long,et al.  Intergenerational Occupational Mobility in Great Britain and the United States since 1850 , 2013 .

[10]  Martha Bailey,et al.  How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data , 2017, Journal of economic literature.

[11]  Ron Goeken,et al.  New Methods of Census Record Linking , 2011, Historical methods.

[12]  Michael Kremer,et al.  Chapter 61 Using Randomization in Development Economics Research: A Toolkit ★ , 2007 .

[13]  L. Boustan,et al.  Have the Poor Always Been Less Likely to Migrate? Evidence from Inheritance Practices During the Age of Mass Migration , 2012, Journal of development economics.

[14]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[15]  Peter Christen,et al.  Febrl - Freely extensible biomedical record linkage , 2002 .

[16]  Thomas Lemieux,et al.  Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach , 1995 .

[17]  Joshua Lewis,et al.  Canary in a Coal Mine: Infant Mortality, Property Values, and Tradeoffs Associated with Mid-20th Century Air Pollution , 2016, SSRN Electronic Journal.

[18]  Daniel Aaronson,et al.  The Impact of Rosenwald Schools on Black Achievement , 2009, Journal of Political Economy.

[19]  Brian A’hearn,et al.  Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital , 2009, The Journal of Economic History.

[20]  T. Hatton The Cliometrics of International Migration: A Survey , 2010, SSRN Electronic Journal.

[21]  E. Oster,et al.  Weighting for External Validity , 2017 .

[22]  C. Goldin,et al.  Watersheds in Child Mortality: The Role of Effective Water and Sewerage Infrastructure, 1880–1920 , 2015, Journal of Political Economy.

[23]  A. Coale,et al.  A Statistical Reconstruction of the Black Population of the United States 1880-1970: Estimates of True Numbers by Age and Sex, Birth Rates, and Total Fertility , 1973 .

[24]  Marco Caliendo,et al.  Some Practical Guidance for the Implementation of Propensity Score Matching , 2005, SSRN Electronic Journal.

[25]  Marianne H. Wanamaker,et al.  African American Intergenerational Economic Mobility Since 1880 , 2017, American Economic Journal: Applied Economics.

[26]  L. Salisbury,et al.  Migration Responses to Conflict: Evidence from the Border of the American Civil War , 2016 .

[27]  Jeffrey M. Wooldridge,et al.  What Are We Weighting For? , 2013, The Journal of Human Resources.

[28]  Suresh Naidu,et al.  When the Levee Breaks: Black Migration and Economic Development in the American South , 2012 .

[29]  Ran Abramitzky,et al.  A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration , 2012, Journal of Political Economy.

[30]  Steven Ruggles,et al.  Linking Historical Censuses: a New Approach , 2002, Hist. Comput..

[31]  J. Hacker New Estimates of Census Coverage in the United States, 1850-1930 , 2013 .

[32]  Matthew E. Kahn,et al.  Moving to Higher Ground: Migration Response to Natural Disasters in the Early Twentieth Century , 2012 .

[33]  J. Kmenta Mostly Harmless Econometrics: An Empiricist's Companion , 2010 .

[34]  J. Ferrie,et al.  Shocking Behavior: Random Wealth in Antebellum Georgia and Human Capital Across Generations , 2013, The quarterly journal of economics.

[35]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[36]  J. Heckman Sample selection bias as a specification error , 1979 .

[37]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[38]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[39]  S. Ruggles Intergenerational Coresidence and Family Transitions in the United States, 1850 - 1880. , 2011, Journal of marriage and the family.

[40]  Joshua Lewis,et al.  Canary in a Coal Mine: Infant Mortality and Tradeoffs Associated with Mid-20th Century Air Pollution , 2022, Review of Economics and Statistics.

[41]  Ran Abramitzky,et al.  Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration , 2010, The American economic review.

[42]  Allison Shertzer Migration in Response to Civil Conflict : Evidence from the Border of the American Civil War , 2016 .