Evaluating latent class models with conditional dependence in record linkage

Record linkage methods commonly use a traditional latent class model to classify record pairs from different sources as true matches or non-matches. This approach was first formally described by Fellegi and Sunter and assumes that the agreement in fields is independent conditional on the latent class. Consequences of violating the conditional independence assumption include bias in parameter estimates from the model. We sought to further characterize the impact of conditional dependence on the overall misclassification rate, sensitivity, and positive predictive value in the record linkage problem when the conditional independence assumption is violated. Additionally, we evaluate various methods to account for the conditional dependence. These methods include loglinear models with appropriate interaction terms identified through the correlation residual plot as well as Gaussian random effects models. The proposed models are used to link newborn screening data obtained from a health information exchange. On the basis of simulations, loglinear models with interaction terms demonstrated the best misclassification rate, although this type of model cannot accommodate other data features such as continuous measures for agreement. Results indicate that Gaussian random effects models, which can handle additional data features, perform better than assuming conditional independence and in some situations perform as well as the loglinear model with interaction terms.

[1]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[2]  M. Tan,et al.  Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. , 1996, Biometrics.

[3]  L. Joseph,et al.  Bayesian Approaches to Modeling the Conditional Dependence Between Multiple Diagnostic Tests , 2001, Biometrics.

[4]  Yves Thibaudeau The Discrimination Power of Dependency Structures in Record Linkage , 1992 .

[5]  William E. Winkler,et al.  Matching and record linkage , 2011 .

[6]  T. Blakely,et al.  Probabilistic record linkage and a method to calculate the positive predictive value. , 2002, International journal of epidemiology.

[7]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[8]  J. Hagenaars Latent Structure Models with Direct Effects between Indicators , 1988 .

[9]  Gene H. Golub,et al.  Calculation of Gauss quadrature rules , 1967, Milestones in Matrix Computation.

[10]  Els Goetghebeur,et al.  Diagnostic test analyses in search of their gold standard: latent class analyses with random effects , 2000, Statistical methods in medical research.

[11]  Huiping Xu,et al.  A Probit Latent Class Model with General Correlation Structures for Evaluating Accuracy of Diagnostic Tests , 2009, Biometrics.

[12]  Johannes B. Reitsma,et al.  Research Paper: Ignoring Dependency between Linking Variables and Its Impact on the Outcome of Probabilistic Record Linkage Studies , 2008, J. Am. Medical Informatics Assoc..

[13]  P. Albert,et al.  A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard , 2004, Biometrics.

[14]  S. E. Fienberg,et al.  Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data , 2007, 0709.3535.

[15]  H. Newcombe The Use of Medical Record Linkage for Population and Genetic Studies , 1969, Methods of Information in Medicine.

[16]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[17]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[18]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[19]  C. Clogg Latent Class Models , 1995 .

[20]  D. Rubin,et al.  Iterative Automated Record Linkage Using Mixture Models , 2001 .

[21]  Vivienne J. Zhu,et al.  Research Paper: An Empiric Modification to the Probabilistic Record Linkage Algorithm Using Frequency-Based Weight Scaling , 2009, J. Am. Medical Informatics Assoc..

[22]  S. Zeger,et al.  Latent Class Model Diagnosis , 2000, Biometrics.

[23]  Josef Schürle A method for consideration of conditional dependencies in the Fellegi and Sunter model of record linkage , 2005 .

[24]  P M Vacek,et al.  The effect of conditional dependence on the evaluation of diagnostic tests. , 1985, Biometrics.

[25]  Huiping Xu,et al.  Evaluating accuracy of diagnostic tests with intermediate results in the absence of a gold standard , 2013, Statistics in medicine.

[26]  Shanti Gomatam,et al.  An empirical comparison of record linkage procedures , 2002, Statistics in medicine.

[27]  S D Walter,et al.  Effects of dependent errors in the assessment of diagnostic test performance. , 1997, Statistics in medicine.

[28]  Guohua Yan,et al.  Stepwise Variable Selection for Loglinear Mixture in Record Linkage , 2010 .

[29]  Margaret Sullivan Pepe,et al.  Insights into latent class analysis of diagnostic test performance. , 2007, Biostatistics.

[30]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .