Database and Expert Systems Applications

Collecting, releasing and sharing microdata about individuals is needed in some domains to support research initiatives aiming to create new valuable knowledge, by means of data mining and analysis tools. Thus, seeking individuals’ anonymity is required to guarantee their privacy prior publication. The k-anonymity by microaggregation, is a widely accepted model for data anonymization. It consists in de-associating the relationship between the identity of data subjects, i.e. individuals, and their confidential information. However, this method shows limits when dealing with real datasets. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data. Thus, decreasing the information loss during the anonymization process is a compelling task to achieve. This paper aims to deal with such challenge. Doing so, we propose a microaggregation algorithm called Micro-PFSOM, based on fuzzy possibilitic clustering. The main thrust of this algorithm stands in applying an hybrid anonymization process.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[3]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[4]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[5]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[6]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[9]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[10]  Stefan Bender,et al.  Re-identifying Register Data by Survey Data Using Cluster Analysis: An Empirical Study , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[12]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[13]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[14]  Jun Miyazaki,et al.  Fat-Btree: an update-conscious parallel directory structure , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[16]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[17]  Christian Böhm,et al.  The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Thomas Seidl,et al.  Modeling image similarity by Gaussian mixture models and the Signature Quadratic Form Distance , 2011, 2011 International Conference on Computer Vision.

[19]  Shiyong Cui,et al.  Comparison of Kullback-Leibler divergence approximation methods between Gaussian mixture models for satellite image retrieval , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[20]  Driss Aboutajdine,et al.  Organizing Gaussian mixture models into a tree for scaling up speaker retrieval , 2007, Pattern Recognit. Lett..

[21]  Casanovas,et al.  Disclosure risk assessment in statistical data protection , 2004 .

[22]  Panos Vassiliadis,et al.  A generic and customizable framework for the design of ETL scenarios , 2005, Inf. Syst..

[23]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[24]  Christian Böhm,et al.  Querying Objects Modeled by Arbitrary Probability Distributions , 2007, SSTD.

[25]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Philip S. Yu,et al.  An Introduction to Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[27]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[28]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[29]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[30]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[31]  Panos Vassiliadis,et al.  Modeling ETL activities as graphs , 2002, DMDW.

[32]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[33]  Tieniu Tan,et al.  Learning activity patterns using fuzzy self-organizing neural network , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[35]  Daniel P. W. Ellis,et al.  Evaluation of Distance Measures Between Gaussian Mixture Models of MFCCs , 2007, ISMIR.

[36]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[37]  Christian Böhm,et al.  Gaussian Component Based Index for GMMs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[38]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[39]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[40]  Christian Böhm,et al.  Knowledge Discovery of Complex Data Using Gaussian Mixture Models , 2017, DaWaK.

[41]  Inderjit S. Dhillon,et al.  Knowledge Discovery: Clustering , 2009, Encyclopedia of Complexity and Systems Science.

[42]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[43]  Nikolas P. Galatsanos,et al.  An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval , 2005, ICANN.

[44]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[45]  Tuomas Virtanen,et al.  Query by Example of Audio Signals using Euclidean Distance Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.