Disclosure Risk vs. Data Utility: The R-U Confidentiality Map

Americans are urged to be more savvy about preventing their personal information from falling into strangers' hands. We are advised to shred documents containing our Social Security numbers, never to give our credit card numbers to people calling us, and not to put our children's names on our personal web sites. At the same time, more and more information is being collected about each of us, just as we go about our daily lives-buying things, paying our taxes, and using public services. Since 9/11, Americans have become more aware of the vast amount of personal information available as government proposals to integrate and mine this information to thwart terrorism have gotten public scrutiny. Most of these proposals involve gaining information about individuals by using the newest technology to link files from diverse sources. Critics have asked, "What is to keep this technology from being used for inappropriate or illicit purposes?" Maintaining a long tradition, government statistical agencies, such as the U.S. Census Bureau and the National Center for Education Statistics, have established policies to protect the privacy of respondents and maintain the confidentiality of the data they collect. In many cases this mandate to protect privacy by avoiding unwarranted intrusion and to maintain confidentiality by ensuring that data collected are not improperly used is stipulated by federal legislation, such as Title XIII for the Census Bureau. Set in tension with this mandate for privacy and confidentiality is the mandate to provide reliable data products bearing on the functioning of our economy and society. These data inform public policy decisions, such as where to put new roads and schools, and how much money is needed to take care of the medical needs of the poor. In addition, other users, such as academics, journalists, and corporate planners, depend on these data for the factual basis of their strategic analysis and managerial decisions.

[1]  R. Fisher,et al.  Limiting forms of the frequency distribution of the largest or smallest member of a sample , 1928, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  J. Wolfowitz,et al.  Introduction to the Theory of Statistics. , 1951 .

[3]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[4]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[5]  G. Paass Disclosure Risk and Disclosure Avoidance for Microdata , 1988 .

[6]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[7]  Gregory T. Sullivan,et al.  The use of measurement error to avoid disclosure , 1989 .

[8]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[9]  C. Skinner,et al.  The case for samples of anonymized records from the 1991 census. , 1991, Journal of the Royal Statistical Society. Series A,.

[10]  Jeroen Pannekoek,et al.  Disclosure risks for microdata , 1992 .

[11]  G. Duncan,et al.  Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics , 1993 .

[12]  C. Skinner,et al.  Safe data versus safe setting: access to microdata from the British Census , 1994 .

[13]  Walter Müller,et al.  Identification Risks of Microdata , 1995 .

[14]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[15]  G. Duncan,et al.  WHO SHOULD MANAGE INFORMATION AND PRIVACY CONFLICTS?: INSTITUTIONAL DESIGN FOR THIRD‐PARTY MECHANISMS , 1996 .

[16]  L. Stefanski,et al.  Simulation extrapolation deconvolution of finite population cumulative distribution function estimators , 1996 .

[17]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[18]  George T. Duncan,et al.  Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data , 1997 .

[19]  William E. Winkler,et al.  Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata , 1998 .

[20]  Mark Elliot,et al.  Scenarios of attack: the data intruder's perspective on statistical disclosure risk , 1999 .

[21]  Ramayya Krishnan,et al.  Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators , 1999 .

[22]  George T. Duncan,et al.  Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise , 2000 .

[23]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[24]  C. Mackie,et al.  Improving access to and confidentiality of research data , 2000 .

[25]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[26]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[27]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[28]  Ramayya Krishnan,et al.  Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[29]  M. Trottini A Decision-Theoretic Approach to Data Disclosure Problems , 2001 .

[30]  George T. Duncan,et al.  Confidentiality and Statistical Disclosure Limitations , 2001 .

[31]  Nancy L. Spruill THE CONFIDENTIALITY AND ANALYTIC USEFULNESS OF MASKED BUSINESS MICRODATA , 2002 .

[32]  Simon D. Woodcock,et al.  Disclosure Limitation in Longitudinal Linked Data , 2002 .

[33]  Alan F. Karr,et al.  Distortion Measures for Categorical Data Swapping , 2003 .

[34]  Shanti Gomatam,et al.  Record Linkage and Counterterrorism , 2004 .