Exact, Heuristic and Metaheuristic Methods for Confidentiality Protection by Controlled Tabular Adjustment

Government agencies and commercial organizations that report data face the task of representing the data meaningfully while simultaneously protecting the confidentiality of critical data components. The challenge is to organize and disseminate data in a form that prevents these components from being unmasked by corporate espionage, or falling prey to efforts to penetrate the security of the information underlying the data. Unscrupulous data investigators could use unprotected data sources to infer sensitive, personal data about individuals. Besides harming individuals, these types of disclosures can drastically affect the willingness of future respondents to provide valuable data. Controlled tabular adjustment is a recently developed approach for protecting sensitive information by imposing a special form of statistical disclosure limitation on tabular data. The underlying model gives rise to a mixed integer linear programming problem involving both continuous and discrete (zero-one) variables. In this paper we develop new hybrid heuristics and a new meta-heuristic learning approach for solving this model, and compare their performance to previous heuristics and to an exact algorithm in the ILOG-CPLEX software. Our new approaches are based on partitioning the problem into its discrete and continuous components, and first creating a hybrid that reduces the number of binary variables through a grouping procedure that combines an exact mathematical programming model with constructive heuristics. Finally, we introduce a new metaheuristic learning method that significantly improves the quality of solutions obtained.

[1]  David R. Karger,et al.  Random Sampling in Cut, Flow, and Network Design Problems , 1999, Math. Oper. Res..

[2]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[3]  Matteo Fischetti,et al.  Solving the Cell Suppression Problem on Tabular Data with Linear Constraints , 2001, Manag. Sci..

[4]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[5]  Matteo Fischetti,et al.  Models and algorithms for the 2-dimensional cell suppression problem in statistical disclosure control , 1999, Math. Program..

[6]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[7]  L. Cox Linear sensitivity measures in statistical disclosure control , 1981 .

[8]  Lawrence H. Cox,et al.  Computational Aspects of Controlled Tabular Adjustment: Algorithm and Analysis , 2005 .

[9]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[10]  David R. Karger,et al.  Random sampling in cut, flow, and network design problems , 1994, STOC '94.

[11]  F. Glover HEURISTICS FOR INTEGER PROGRAMMING USING SURROGATE CONSTRAINTS , 1977 .

[12]  Fred W. Glover,et al.  Parametric Ghost Image Processes for Fixed-Charge Problems: A Study of Transportation Networks , 2005, J. Heuristics.

[13]  James P. Kelly,et al.  Cell suppression: Disclosure protection for sensitive tabular data , 1992, Networks.

[14]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[15]  C. R. Lawson Discussion on Session 1 , 1995 .