Microaggregation for Categorical Variables: A Median Based Approach

Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices found in the literature. The method is compared with Top and Bottom Coding, Global recoding, Rank Swapping and PRAM.

[1]  Michel Grabisch Modelling data by the Choquet integral , 2003 .

[2]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[3]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[4]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[5]  Vicenç Torra,et al.  On aggregation operators for ordinal qualitative information , 2000, IEEE Trans. Fuzzy Syst..

[6]  Gordon Sande,et al.  Exact and Approximate Methods for Data Directed Microaggregation in One or More Dimensions , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[8]  Vicenç Torra On some relationships between hierarchies of quasiarithmetic means and neural networks , 1999 .

[9]  Z. S. Xu,et al.  An overview of operators for aggregating information , 2003, Int. J. Intell. Syst..

[10]  R. Yager Quantifier guided aggregation using OWA operators , 1996, Int. J. Intell. Syst..

[11]  Vicenç Torra,et al.  Towards the Re-identification of Individuals in Data Files with Non-common Variables , 2000, ECAI.

[12]  Vicenç Torra,et al.  Aggregation of linguistic labels when semantics is based on antonyms , 2001, Int. J. Intell. Syst..

[13]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[14]  Stefan Bender,et al.  Re-identifying Register Data by Survey Data Using Cluster Analysis: An Empirical Study , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Gleb Beliakov,et al.  How to build aggregation operators from data , 2003, Int. J. Intell. Syst..

[16]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[17]  Rita Almeida Ribeiro Fuzzy multiple attribute decision making: A review and new preference elicitation techniques , 1996, Fuzzy Sets Syst..

[18]  Sadaaki Miyamoto,et al.  Methods in Hard and Fuzzy Clustering , 2000 .

[19]  R. Yager,et al.  Learning OWA operator weights from data , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[20]  Josep Domingo-Ferrer,et al.  On the connections between statistical disclosure control for microdata and some artificial intelligence tools , 2003, Inf. Sci..

[21]  V. Torra On the learning of weights in some aggregation operators : the weighted mean and OWA operators , 1998 .

[22]  菅野 道夫,et al.  Theory of fuzzy integrals and its applications , 1975 .

[23]  Dimitar Filev,et al.  On the issue of obtaining OWA operator weights , 1998, Fuzzy Sets Syst..

[24]  John F. Kolen,et al.  Reducing the time complexity of the fuzzy c-means algorithm , 2002, IEEE Trans. Fuzzy Syst..

[25]  Josep Domingo-Ferrer,et al.  Disclosure risk assessment in statistical microdata protection via advanced record linkage , 2003, Stat. Comput..

[26]  Vicenç Torra,et al.  Learning weights for the quasi-weighted means , 2002, IEEE Trans. Fuzzy Syst..

[27]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[28]  Hung T. Nguyen,et al.  Fundamentals of Uncertainty Calculi with Applications to Fuzzy Inference , 1994 .

[29]  R. Mesiar,et al.  Aggregation operators: new trends and applications , 2002 .

[30]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[31]  J. Chiang,et al.  A new kernel-based fuzzy clustering approach: support vector clustering with cell growing , 2003, IEEE Trans. Fuzzy Syst..

[32]  T. Saaty,et al.  The Analytic Hierarchy Process , 1985 .

[33]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[34]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[35]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[36]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[37]  V. Torra The weighted OWA operator , 1997, International Journal of Intelligent Systems.

[38]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[39]  Lawrence O. Hall,et al.  Fast Accurate Fuzzy Clustering through Data Reduction , 2003 .

[40]  Francisco Herrera,et al.  A Sequential Selection Process in Group Decision Making with a Linguistic Assessment Approach , 1995, Inf. Sci..

[41]  David F. Nettleton,et al.  Processing and representation of meta-data for sleep apnea diagnosis with an artificial intelligence approach , 2001, Int. J. Medical Informatics.

[42]  M. O'Hagan,et al.  Aggregating Template Or Rule Antecedents In Real-time Expert Systems With Fuzzy Set Logic , 1988, Twenty-Second Asilomar Conference on Signals, Systems and Computers.

[43]  T. Kunii,et al.  Soft Computing and Human-Centered Machines , 2013, Computer Science Workbench.

[44]  V. Torra Learning weights for weighted OWA operators , 2000, 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies.

[45]  Jacek M. Leski Generalized weighted conditional fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[46]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47]  R. Yager Families of OWA operators , 1993 .

[48]  V. Torra Negation functions based semantics for ordered linguistic labels , 1996 .

[49]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[50]  Josep Domingo-Ferrer,et al.  Median‐based aggregation operators for prototype construction in ordinal scales , 2003, Int. J. Intell. Syst..

[51]  G. Mayor,et al.  On a class of monotonic extended OWA operators , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[52]  Michael K. Ng,et al.  A fuzzy k-modes algorithm for clustering categorical data , 1999, IEEE Trans. Fuzzy Syst..

[53]  Vicenç Torra,et al.  The WOWA operator and the interpolation function W*: Chen and Otto's interpolation method revisited , 2000, Fuzzy Sets Syst..

[54]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[55]  F. Felsö,et al.  Disclosure limitation methods in use: results of a survey , 2001 .

[56]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[57]  Eric R. Ziegel,et al.  Business survey methods , 1995 .

[58]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[59]  Vicenc Torra,et al.  Information Fusion in Data Mining , 2003 .