Combining Family History and Machine Learning to Link Historical Records

A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.

[1]  J. Hacker New Estimates of Census Coverage in the United States, 1850-1930 , 2013 .

[2]  Raj Chetty,et al.  The Impacts of Neighborhoods on Intergenerational Mobility I: Childhood Exposure Effects , 2016 .

[3]  Connor Cole,et al.  How Well Do Automated Methods Perform in Historical Samples? Evidence from New Ground Truth , 2017 .

[4]  James J. Feigenbaum,et al.  Automated Linking of Historical Data , 2019, Journal of Economic Literature.

[5]  Sven E. Wilson,et al.  Data set from the Union Army samples to study locational choice and social networks , 2017, Data in brief.

[6]  Catherine Massey,et al.  Playing with matches: An assessment of accuracy in linked historical data , 2017 .

[7]  Jonathan M. V. Davis,et al.  Parental Earnings and Children's Well‐Being: An Analysis of the Survey of Income and Program Participation Matched to Social Security Administration Earnings Data , 2013 .

[8]  Ran Abramitzky,et al.  A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration , 2012, Journal of Political Economy.

[9]  Marianne H. Wanamaker,et al.  Selection and Economic Gains in the Great Migration of African Americans: New Evidence from Linked Census Data , 2013 .

[10]  M. Gutmann,et al.  “Big Data” in Economic History , 2018, The Journal of Economic History.

[11]  James Feigenbaum,et al.  Multiple Measures of Historical Intergenerational Mobility: Iowa 1915 to 1940 , 2018, The Economic Journal.

[12]  J. Ferrie A New Sample of Americans Linked from the 1850 Public Use Micro Sampleofthe Federal Census of Population to The1860 Federal Census Manuscript Sched , 1995 .

[13]  Marianne H. Wanamaker,et al.  The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants , 2015 .

[14]  Ran Abramitzky,et al.  Linking individuals across historical sources: A fully automated approach* , 2018, Historical Methods: A Journal of Quantitative and Interdisciplinary History.

[15]  Ron Goeken,et al.  New Methods of Census Record Linking , 2011, Historical methods.

[16]  Mary F. Evans,et al.  The Developmental Effect of State Alcohol Prohibitions at the Turn of the Twentieth Century , 2016 .

[17]  J. Ferrie,et al.  Typhoid Fever, Water Quality, and Human Capital Formation , 2014, The Journal of Economic History.

[18]  Vasiliki Fouka How Do Immigrants Respond to Discrimination? The Case of Germans in the US During World War I , 2019, American Political Science Review.

[19]  Raj Chetty,et al.  Mobility Report Cards: The Role of Colleges in Intergenerational Mobility , 2017 .

[20]  Santiago Pérez,et al.  Intergenerational Occupational Mobility across Three Continents , 2019, The Journal of Economic History.

[21]  James J. Feigenbaum,et al.  Automated Census Record Linking: A Machine Learning Approach , 2016 .

[22]  Dan Geiger,et al.  Quantitative analysis of population-scale family trees with millions of relatives , 2017, Science.

[23]  Roy Mill,et al.  Race, Skin Color, and Economic Outcomes in Early Twentieth-Century America , 2016 .

[24]  Zachary Ward,et al.  Age at Arrival and Assimilation During the Age of Mass Migration , 2018, The Journal of Economic History.