Choquet integral for record linkage

Record linkage is used in data privacy to evaluate the disclosure risk of protected data. It models potential attacks, where an intruder attempts to link records from the protected data to the original data. In this paper we introduce a novel distance based record linkage, which uses the Choquet integral to compute the distance between records. We use a fuzzy measure to weight each subset of variables from each record. This allows us to improve standard record linkage and provide insightful information about the re-identification risk of each variable and their interaction. To do that, we use a supervised learning approach which determines the optimal fuzzy measure for the linkage.

[1]  Jean-Paul Chilès,et al.  Wiley Series in Probability and Statistics , 2012 .

[2]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[3]  Josep Domingo-Ferrer,et al.  Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment , 2006, Privacy in Statistical Databases.

[4]  Matthias Templ,et al.  A Graphical User Interface for Microdata Protection Which Provides Reproducibility and Interactions: the sdcMicro GUI , 2009, Trans. Data Priv..

[5]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[6]  William E. Winkler Data Cleaning Methods , 2003 .

[7]  G. Choquet Theory of capacities , 1954 .

[8]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[9]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[10]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[11]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[12]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[13]  Jeff Moad,et al.  International Business Machines Corp. , 1993 .

[14]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Pascal Heus,et al.  Data Access in a Cyber World: Making Use of Cyberinfrastructure , 2008, Trans. Data Priv..

[16]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[17]  Eric R. Ziegel,et al.  Business survey methods , 1995 .

[18]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[19]  Vicenç Torra,et al.  On the Applications of Aggregation Operators in Data Privacy , 2010, IUM.

[20]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[21]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[22]  Vicenç Torra,et al.  Constrained Microaggregation: Adding Constraints for Data Editing , 2008, Trans. Data Priv..

[23]  Vicenç Torra,et al.  Modeling decisions - information fusion and aggregation operators , 2007 .

[24]  Stasha Ann Bown Larsen,et al.  Record Linkage , 2018, Encyclopedia of Database Systems.

[25]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[26]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[27]  Michael Colledge Frames and Business Registers: An Overview , 2011 .

[28]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[29]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[30]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[31]  N. James Automatic Linkage of Vital Records Computers , 2022 .