Secure attribute sharing of linked microdata

Two organizations that have records on the same collection of individuals can benefit from sharing attributes on these individuals. The combined data, with records linked on certain common identifying information, is termed linked microdata. Linked microdata attributes can add considerable value to organizations by enabling them to perform analysis that can provide important information on individual (or record-level) data items. We illustrate practical examples of the need and benefits of sharing linked microdata and identify important privacy issues relating to this context. Based on a conditional distribution approach, we develop a procedure (SASH) for sharing masked attributes in linked microdata that addresses these privacy issues. Our experimental results show that SASH achieves a priori expectations of analytical usefulness, without either party having to provide true values of attribute data. Our results also show that an ad hoc approach such as data swapping, cannot achieve privacy without sacrificing usefulness or vice versa. Our study should provide immediate practical benefits to organizations interested in secure attribute sharing of linked microdata. Secure sharing of linked microdata attributes can add considerable value to organizations.The SASH procedure enables two parties to share microdata without either party having to provide true values of attribute data.The masked data by SASH preserves the statistical characteristics of the original data and minimizes the disclosure risk.An ad hoc approach such as data swapping, cannot achieve privacy without sacrificing usefulness or vice versa.The SASH procedure has the following specific characteristics:(a)Predictable analytical validity and security based on a sound theoretical basis (conditional distribution approach).(b)Flexibility in determining which (if any) attributes to share based on security considerations(c)Robustness to distributional assumptions as much of the information exchanged is rank-based and consequently non-parametric.(d)Better user acceptance because reverse-mapping maintains marginal characteristics, alleviating concerns about "artificial data".(e)Clear and immediate practical utility, enabling individual customer-level decisions rather than just group-level decisions.

[1]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[2]  Peter J. Danaher,et al.  Modeling Multivariate Distributions Using Copulas: Applications in Marketing , 2011, Mark. Sci..

[3]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..

[4]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[5]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[6]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[7]  R. Clemen,et al.  Correlations and Copulas for Decision and Risk Analysis , 1999 .

[8]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[9]  Marc G. Genton,et al.  Perturbation of Numerical Confidential Data via Skew-t Distributions , 2010, Manag. Sci..

[10]  Roger B. Nelsen,et al.  Copulas, Characterization, Correlation, and Counterexamples , 1995 .

[11]  Rathindra Sarathy,et al.  Secure and useful data sharing , 2006, Decis. Support Syst..

[12]  Yi Qian,et al.  Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases , 2013, Manag. Sci..

[13]  Rathindra Sarathy,et al.  A theoretical basis for perturbation methods , 2003, Stat. Comput..

[14]  Stephen E. Fienberg,et al.  Statistical Disclosure Limitation for~Data~Access , 2018, Encyclopedia of Database Systems.

[15]  Rathindra Sarathy,et al.  Perturbing Nonnormal Confidential Attributes: The Copula Approach , 2002, Manag. Sci..

[16]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[17]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[18]  Nigel Melville,et al.  Research Note - Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation , 2012, Inf. Syst. Res..

[19]  Jeffrey S. Simonoff,et al.  The use of regression methodology for the compromise of confidential information in statistical databases , 1987, TODS.

[20]  V. Kumar,et al.  Managing Customer Profits: The Power of Habits , 2014 .

[21]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[22]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[23]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[24]  N. L. Johnson,et al.  Continuous Multivariate Distributions, Volume 1: Models and Applications , 2019 .