Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise

Abstract Disclosure limitation methods transform statistical databases to protect confidentiality, a practical concern of statistical agencies. A statistical database responds to queries with aggregate statistics. The database administrator should maximize legitimate data access while keeping the risk of disclosure below an acceptable level. Legitimate users seek statistical information, generally in aggregate form; malicious users—the data snoopers—attempt to infer confidential information about an individual data subject. Tracker attacks are of special concern for databases accessed online. This article derives optimal disclosure limitation strategies under tracker attacks for the important case of data masking through additive noise. Operational measures of the utility of data access and of disclosure risk are developed. The utility of data access is expressed so that trade-offs can be made between the quantity and the quality of data to be released. Application is made to Ohio data from the 1990 census. The article derives conditions under which an attack by a data snooper is better thwarted by a combination of query restriction and data masking than by either disclosure limitation method separately. Data masking by independent noise addition and data perturbation are considered as extreme cases in the continuum of data masking using positively correlated additive noise. Optimal strategies are established for the data snooper. Circumstances are determined under which adding autocorrelated noise is preferable to using existing methods of either independent noise addition or data perturbation. Both moving average and autoregressive noise addition are considered.

[1]  Ivan P. Fellegi,et al.  On the Question of Statistical Confidentiality , 1972 .

[2]  J. Schlörer Identification and Retrieval of Personal Records from a Statistical Data Bank , 1975, Methods of Information in Medicine.

[3]  O. Frank An application of information theory to the problem of statistical disclosure , 1978 .

[4]  Peter J. Denning,et al.  The tracker: a threat to statistical database security , 1979, TODS.

[5]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[6]  Dorothy E. Denning,et al.  A fast procedure for finding a tracker in a statistical database , 1980, TODS.

[7]  Leland L. Beck,et al.  A security machanism for statistical database , 1980, TODS.

[8]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[9]  Jan Schlörer,et al.  Security of statistical databases: multidimensional transformation , 1980, TODS.

[10]  Gultekin Özsoyoglu,et al.  Statistical database design , 1981, TODS.

[11]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[12]  Nancy L. Spruill,et al.  On the Estimation of the Correlation Coefficient from Grouped Data , 1982 .

[13]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[14]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[15]  Norman S. Matloff Another Look at the Use of Noise Addition for Database Security , 1986, 1986 IEEE Symposium on Security and Privacy.

[16]  Niv Ahituv,et al.  Protecting statistical databases against retrieval of private information , 1988, Comput. Secur..

[17]  G. Paass Disclosure Risk and Disclosure Avoidance for Microdata , 1988 .

[18]  L. Rainwater,et al.  The Luxembourg Income Study: The Use of International Telecommunications in Comparative Social Research , 1988 .

[19]  Zbigniew Michalewicz,et al.  Ranges and Trackers in Statistical Databases , 1988, SSDBM.

[20]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[21]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[22]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[23]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[24]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[25]  George T. Duncan,et al.  Microdata disclosure limitation in statistical databases: query size and random sample query control , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[26]  Harold S. Javitz,et al.  The SRI IDES statistical anomaly detector , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[27]  G. Duncan,et al.  Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics , 1993 .

[28]  Norman S. Matloff,et al.  A modified random perturbation method for database security , 1994, TODS.

[29]  Lawrence H. Cox,et al.  Network Models for Complementary Cell Suppression , 1995 .

[30]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[31]  Sumitra Mukherjee,et al.  Should non-sensitive attributes be masked? Data quality implications of data perturbation in regression analysis , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[32]  Peter Mell,et al.  Intrusion Detection Systems , 2001 .