Data swapping as a decision problem

We construct a decision-theoretic formulation of data swapping in which quantitative measures of disclosure risk and data utility are employed to select one release from a possibly large set of candidates. The decision variables are the swap rate, swap attribute(s) and possibly, constraints on the unswapped attributes. Risk–utility frontiers, consisting of those candidates not dominated in (risk, utility) space by any other candidate, are a principal tool for reducing the scale of the decision problem. Multiple measures of disclosure risk and data utility, including utility measures based directly on use of the swapped data for statistical inference, are introduced. Their behavior and resulting insights into the decision problem are illustrated using data from the Current Population Survey, the well-studied “Czech auto worker data” and data on schools and administrators generated by the National Center for Education Statistics.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  H. T. Kung,et al.  On Finding the Maxima of a Set of Vectors , 1975, JACM.

[3]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Confidentiality Problems in Microdata Release , 1979 .

[6]  Steven P. Reiss Practical Data-Swapping: The First Steps , 1980, 1980 IEEE Symposium on Security and Privacy.

[7]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[8]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[9]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[10]  Grace L. Yang,et al.  Asymptotics In Statistics , 1990 .

[11]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[12]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[13]  William E. Winkler,et al.  Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata , 1998 .

[14]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[15]  A. Zaslavsky,et al.  Balancing Disclosure Risk Against the Loss of Nonpublication , 1999 .

[16]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[17]  Ramayya Krishnan,et al.  Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[18]  M. Trottini A Decision-Theoretic Approach to Data Disclosure Problems , 2001 .

[19]  Stephen E. Fienberg,et al.  Software Systems for Tabular Data Releases , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Mark Elliot,et al.  Disclosure Risk Assessment , 2002 .

[21]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[22]  Alan F. Karr,et al.  Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues , 2003, Stat. Comput..

[23]  Alan F. Karr,et al.  Table servers protect confidentiality in tabular data releases , 2003, CACM.

[24]  Alan F. Karr,et al.  Distortion Measures for Categorical Data Swapping , 2003 .

[25]  Alan F. Karr,et al.  NISS WebSwap: A Web Service for Data Swapping , 2003 .

[26]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[27]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[28]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[29]  Jerome P. Reiter,et al.  Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Analysis Servers , 2005 .

[30]  Noel A Cressie,et al.  Cressie‐Read Statistic , 2006 .