A Genetic Approach to Statistical Disclosure Control

Statistical disclosure control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. When the published tables contain magnitude data such as turnover or health statistics, the preferred method is to suppress the values of certain cells. Assigning a cost to the information lost by suppressing any given cell creates the “cell suppression problem.” This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. In this paper, we describe the development of a heuristic tool for this problem which hybridizes linear programming (to solve a relaxed version for a single sensitive cell) with a genetic algorithm (to seek an order for considering the sensitive cells which minimizes the final cost). Considering a range of real-world and representative “artificial” datasets, we show that the method is able to provide relatively low cost solutions for far larger tables than is possible for the optimal approach to tackle. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is then extended and applied to large statistical tables with over 200000 cells.

[1]  Jordi Castro,et al.  Network Flows Heuristics for Complementary Cell Suppression: An Empirical Evaluation and Extensions , 2002, Inference Control in Statistical Databases.

[2]  Jim E. Smith,et al.  On Replacement Strategies in Steady State Evolutionary Algorithms , 2007, Evolutionary Computation.

[3]  Hans-Georg Beyer,et al.  The Theory of Evolution Strategies , 2001, Natural Computing Series.

[4]  Alistair R. Clark,et al.  Pre-processing Optimisation Applied to the Classical Integer Programming Model for Statistical Disclosure Control , 2008, Privacy in Statistical Databases.

[5]  Matteo Fischetti,et al.  Solving the Cell Suppression Problem on Tabular Data with Linear Constraints , 2001, Manag. Sci..

[6]  Alistair R. Clark,et al.  A Genetic Approach to Statistical Disclosure Control , 2012, IEEE Transactions on Evolutionary Computation.

[7]  Matteo Fischetti,et al.  Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control , 1999, Math. Program..

[8]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[9]  Thomas Bartz-Beielstein,et al.  Sequential Parameter Optimization Applied to Self-Adaptation for Binary-Coded Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[10]  Gabriela Schütz,et al.  Cell suppression problem: A genetic-based approach , 2008, Comput. Oper. Res..

[11]  Katia Sycara,et al.  Reasons for premature convergence of self-adapting mutation rates , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[12]  Dale A. Robertson Improving Statistics Canada's cell suppression software (CONFID) , 2000 .

[13]  James P. Kelly,et al.  Cell suppression: Disclosure protection for sensitive tabular data , 1992, Networks.

[14]  Jordi Castro,et al.  Quadratic interior-point methods in statistical disclosure control , 2005, Comput. Manag. Sci..

[15]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[16]  Matteo Fischetti,et al.  Experiments with Controlled Rounding for Statistical Disclosure Control in Tabular Data with Linear , 1998 .

[17]  James E. Smith,et al.  Self-Adaptation of Mutation Operator and Probability for Permutation Representations in Genetic Algorithms , 2010, Evolutionary Computation.