An Intuitive Formulation and Solution of the Exact Cell-Bounding Problem for Contingency Tables of Conditional Frequencies

Recent research has raised questions about the structure and calculation of bounds on the underlying cell counts for a contingency table released in the form of conditional probabilities. This problem has implications for statistical disclosure limitation. We elucidate the mathematical structure of the problem in fairly elementary terms, under the assumption that the unrounded conditionals and sample size are known. To do so, we reformulate a standard integer programming approach as a knapsack problem, show that this provides many insights into the problem, and provide illustrations in the context of several datasets. In particular, we demonstrate that the tightest bounds are much easier to calculate than previously believed, and we also identify circumstances in which disclosure is either guaranteed or unlikely to occur.

[1]  A. Rinaldo,et al.  Algebraic Statistics and Contingency Table Problems: Log-Linear Models, Likelihood Estimation, and Disclosure Limitation , 2009 .

[2]  Vishesh Karwa,et al.  Conditional inference given partial information in contingency tables using Markov bases , 2013 .

[3]  Stephen E. Fienberg,et al.  Algebraic and Geometric Methods in Statistics: The generalised shuttle algorithm , 2009 .

[4]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[5]  Stephen E. Fienberg,et al.  Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications , 2012, J. Priv. Confidentiality.

[6]  P. Halfpenny The Analysis of Qualitative Data , 1979 .

[7]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[8]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[9]  S. Haberman Analysis of qualitative data , 1978 .

[10]  S. E. Fienberg,et al.  Algebraic and Geometric Methods in Statistics: Algebraic geometry of 2×2 contingency tables , 2009 .

[11]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases, From Theory to Practice , 2002 .

[12]  Stephen E. Fienberg,et al.  Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules* , 2005, Data Mining and Knowledge Discovery.

[13]  Lawrence H. Cox,et al.  Bounds on Entries in 3-Dimensional Contingency Tables Subject to Given Marginal Totals , 2002, Inference Control in Statistical Databases.

[14]  Aleksandra Slavkovic,et al.  Partial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts , 2010, J. Priv. Confidentiality.

[15]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[16]  Byran J. Smucker,et al.  Cell Bounds in k-way Tables Given Conditional Frequencies , 2012 .

[17]  S. Sullivant,et al.  Sequential importance sampling for multiway tables , 2006, math/0605615.

[18]  Stephen E. Fienberg,et al.  A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries , 2008, Privacy-Preserving Data Mining.

[19]  Anne-Sophie Charest,et al.  How Can We Analyze Differentially-Private Synthetic Datasets? , 2011, J. Priv. Confidentiality.

[20]  Henry P. Wynn,et al.  Algebraic and geometric methods in statistics , 2009 .

[21]  L. Cox A Constructive Procedure for Unbiased Controlled Rounding , 1987 .

[22]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[23]  A. Slavkovic,et al.  Fibers of multi-way contingency tables given conditionals: relation to marginals, cell bounds and Markov bases , 2014, Annals of the Institute of Statistical Mathematics.

[24]  Byran J. Smucker,et al.  Cell Bounds in Two-Way Contingency Tables Based on Conditional Frequencies , 2008, Privacy in Statistical Databases.

[25]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[26]  Aleksandra Slavkovic,et al.  Synthetic two-way contingency tables that preserve conditional frequencies , 2010 .