On the connections between statistical disclosure control for microdata and some artificial intelligence tools

Statistical disclosure control (SDC) and artificial intelligence (AI) use similar tools for different purposes. This work describes the common elements of both areas to increase their synergy.SDC is a discipline that seeks to modify statistical data so that they can be published (typically by National Statistical Offices) without giving away the identity of any individual behind the data. When dealing with individual data (microdata in SDC jargon), both SDC procedures and AI knowledge integration procedures use similar principles for different purposes (masking data vs. improving its quality). Similarities can also be found for methods evaluating re-identification risk in SDC and data mining tools for making data consistent.This paper explores those methodological connections with the aim of stimulating interaction between both fields. In particular, data mining turns out to be a common interest of both fields.

[1]  J. Domingo-Ferrer,et al.  Resampling for statistical confidentiality in contingency tables , 1999 .

[2]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  W. Vach,et al.  Preserving consensus hierarchies , 1994 .

[5]  R. Mesiar,et al.  Aggregation operators: new trends and applications , 2002 .

[6]  Aïda Valls,et al.  On the semantics of qualitative attributes in knowledge elicitation , 1999, Int. J. Intell. Syst..

[7]  David Riaño,et al.  Automatic Construction of Descriptive Rules , 1998, AI Commun..

[8]  H. Messatfa An algorithm to maximize the agreement between partitions , 1992 .

[9]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[10]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[11]  H. A. Guvenir,et al.  Classification by Feature Partitioning , 1996, Machine Learning.

[12]  J. Kacprzyk,et al.  Multiperson decision making models using fuzzy sets and possibility theory , 1990 .

[13]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[14]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[15]  Brian R. Gaines,et al.  Knowledge acquisition tools based on personal construct psychology , 1993, The Knowledge Engineering Review.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[18]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[19]  Josep Domingo-ferrer Pros and Cons of New Information Technologies for Statistical Data Protection , 1998 .

[20]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[21]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[22]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[23]  J. Barthelemy,et al.  On the use of ordered sets in problems of comparison and consensus of classifications , 1986 .

[24]  Mongi A. Abidi,et al.  Data fusion in robotics and machine intelligence , 1992 .

[25]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  C. J. Moore,et al.  Knowledge elicitation using more than one expert to cover the same domain , 1991, Artificial Intelligence Review.

[27]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[28]  Clive L. Dym,et al.  Knowledge Acquisition from Multiple Experts , 1984, AI Mag..

[29]  Jeffrey M. Bradshaw,et al.  Knowledge acquisition as modeling , 1993 .

[30]  Ulises Cortés,et al.  Towards an automatic consensus generator tool: EGAC , 1995, IEEE Trans. Syst. Man Cybern..

[31]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .

[32]  菅野 道夫,et al.  Theory of fuzzy integrals and its applications , 1975 .

[33]  Anco Hundepool The CASC Project , 2002, Inference Control in Statistical Databases.

[34]  Norman S. Matloff,et al.  A modified random perturbation method for database security , 1994, TODS.

[35]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[36]  A. Kennickell Multiple Imputation and Disclosure Protection : TheCase of the 1995 Survey of Consumer Finances , 2000 .

[37]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[38]  Daniel E. O'Leary,et al.  Knowledge Acquisition From Multiple Experts: An Empirical Study , 1998 .

[39]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[40]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[41]  James M. Keller,et al.  Will the real iris data please stand up? , 1999, IEEE Trans. Fuzzy Syst..

[42]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[43]  Theodore Johnson,et al.  Squashing flat files flatter , 1999, KDD '99.

[44]  Didier Dubois,et al.  On the combination of uncertain or imprecise pieces of information in rule-based systems-A discussion in the framework of possibility theory , 1988, Int. J. Approx. Reason..

[45]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[46]  V. Torra The weighted OWA operator , 1997, International Journal of Intelligent Systems.

[47]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[48]  Stefan Bender,et al.  Re-identifying Register Data by Survey Data Using Cluster Analysis: An Empirical Study , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[49]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[50]  V. Torra,et al.  Aggregation techniques for statistical confidentiality , 2002 .

[51]  P. Laird Learning from Good and Bad Data , 1988 .

[52]  Paulo B. Góes,et al.  Confidentiality via Camouflage: The CVC Approach to Disclosure Limitation When Answering Queries to Databases , 2002, Oper. Res..

[53]  Vicenç Torra,et al.  Towards the Re-identification of Individuals in Data Files with Non-common Variables , 2000, ECAI.

[54]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[55]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..