New Methods of Census Record Linking

Abstract The Minnesota Population Center (MPC) has released linked data sets through its North Atlantic Population Project and Integrated Public Use Microdata System, making them readily accessible to researchers. Before the availability of complete-count census microdata from the MPC, researchers applied various forms of record-linking software. This article describes the techniques used in the MPC's linking program and briefly compares this technique with those used by other researchers. The key feature of the MPC linking method is the construction of cumulative name-similarity scores, based on approximately 2.5 billion record comparisons; it also uses support vector mechanics to classify potential links. In this article, the authors explain modifications made for the final linked data sets and include a discussion of the role of weighting variables when using linked data.

[1]  S. Preston,et al.  Estimating African-American mortality from inaccurate data , 1994, Demography.

[2]  Peter Christen,et al.  Febrl - A Parallel Open Source Data Linkage System: http://datamining.anu.edu.au/linkage.html , 2004, PAKDD.

[3]  S. H. Aronson,et al.  Poverty and Progress: Social Mobility in a Nineteenth Century City. , 1966 .

[4]  Steven Ruggles,et al.  Linking Historical Censuses: a New Approach , 2002, Hist. Comput..

[5]  Michael P. Weber Yankee Destinies: The Lives of Ordinary Nineteenth-Century Bostonians. By Peter R. Knights (Chapel Hill, North Carolina: University of North Carolina Press, 1991. xxv plus 281 pp. $34.95) , 1993 .

[6]  Steven Ruggles,et al.  Integrated Public Use Microdata Series: Version 3 , 2003 .

[7]  R. Steckel Household migration and rural settlement in the United States, 1850-1860 , 1989 .

[8]  Peter R. Knights Yankee Destinies: The Lives of Ordinary Nineteenth-Century Bostonians , 1991 .

[9]  A. Coale,et al.  A Statistical Reconstruction of the Black Population of the United States 1880-1970: Estimates of True Numbers by Age and Sex, Birth Rates, and Total Fertility , 1973 .

[10]  A. Guest Notes from the National Panel Study: Linkage and Migration in the Late Nineteenth Century , 1987 .

[11]  Lap Huynh,et al.  The Effects of Standardizing Names for Record Linkage: Evidence from the United States and Norway , 2011 .

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[14]  S. H. Aronson,et al.  Poverty and Progress: Social Mobility in a Nineteenth Century City. , 1965 .

[15]  M. Frisch,et al.  The People of Hamilton, Canada West: Family and Class in a Mid-Nineteenth Century City , 1976 .

[16]  Joseph P. Ferrie,et al.  A New Sample of Males Linked from the Public Use Microdata Sample of the 1850 U.S. Federal Census of Population to the 1860 U.S. Federal Census Manuscript Schedules , 1996 .