Automatically Estimating Record Linkage False Match Rates

This paper provides a mechanism for automatically estimating record linkage false match rates in situations where the subset of the true matches is reasonably well separated from other pairs and there is no training data. The method provides an alternative to the method of Belin and Rubin (JASA 1995) and is applicable in more situations. We provide examples demonstrating why the general problem of error rate estimation (both false match and false nonmatch rates) is likely impossible in situations without training data and exceptionally difficult even in the extremely rare situations when training data are available.

[1]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[2]  Howard B. Newcombe,et al.  Record linkage: making maximum use of the discriminating power of identifying information , 1962, CACM.

[3]  Howard B. Newcombe,et al.  Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .

[4]  William E. Winkler On Dykstra's Iterative Fitting Procedure , 1990 .

[5]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[6]  W. Winkler IMPROVED DECISION RULES IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 1993 .

[7]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[8]  W. Winkler USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 2000 .

[9]  D. Katz,et al.  American Statistical Association , 2022, The SAGE Encyclopedia of Research Design.

[10]  W. Winkler Machine Learning , Information Retrieval , and Record Linkage , 2000 .

[11]  D. Rubin,et al.  Iterative Automated Record Linkage Using Mixture Models , 2001 .

[12]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[13]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[14]  W. Winkler SERIES ( Statistics # 2005-02 ) Approximate String Comparator Search Strategies for Very Large Administrative Lists , 2005 .

[15]  William E. Yancey Evaluating String Comparator Performance for Record Linkage , 2005 .

[16]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .