An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers

We present a multidisciplinary solution to the problems of anonymous microaggregation and clustering, illustrated with two applications, namely privacy protection in databases, and private retrieval of location-based information. Our solution is perturbative, is based on the same privacy criterion used in microdata k-anonymization, and provides anonymity through a substantial modification of the Lloyd algorithm, a celebrated quantization design algorithm, endowed with numerical optimization techniques.Our algorithm is particularly suited to the important problem of k-anonymous microaggregation of databases, with a small integer k representing the number of individual respondents indistinguishable from each other in the published database. Our algorithm also exhibits excellent performance in the problem of clustering or macroaggregation, where k may take on arbitrarily large values. We illustrate its applicability in this second, somewhat less common case, by means of an example of location-based services. Specifically, location-aware devices entrust a third party with accurate location information. This party then uses our algorithm to create distortion-optimized, size-constrained clusters, where k nearby devices share a common centroid location, which may be regarded as a distorted version of the original one. The centroid location is sent back to the devices, which use it when contacting untrusted location-based information providers, in lieu of the exact home location, to enforce k-anonymity.We compare the performance of our novel algorithm to the state-of-the-art microaggregation algorithm MDAV, on both synthetic and standardized real data, which encompass the cases of small and large values of k. The most promising aspect of our proposed algorithm is its capability to maintain the same k-anonymity constraint, while outperforming MDAV by a significant reduction in data distortion, in all the cases considered.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[3]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[4]  Urs Hengartner,et al.  A distributed k-anonymity protocol for location privacy , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  David Chaum,et al.  Security without identification: transaction systems to make big brother obsolete , 1985, CACM.

[7]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[8]  R. Krishnamoorthy,et al.  Minimum distortion clustering technique for orthogonal polynomials transform vector quantizer , 2011, ICCCS '11.

[9]  Aaron D. Wyner,et al.  Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .

[10]  Manolis A. Christodoulou,et al.  Convergence properties of a class of learning vector quantization algorithms , 1996, IEEE Trans. Image Process..

[11]  Ling Liu,et al.  Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms , 2008, IEEE Transactions on Mobile Computing.

[12]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Thomas C. Hales Sphere packings, I , 1997, Discret. Comput. Geom..

[14]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[15]  Panos Kalnis,et al.  Private queries in location based services: anonymizers are not necessary , 2008, SIGMOD Conference.

[16]  Josep Domingo-Ferrer,et al.  From t-Closeness-Like Privacy to Postrandomization via Information Theory , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Chi-Yin Chow,et al.  A peer-to-peer spatial cloaking algorithm for anonymous location-based service , 2006, GIS '06.

[18]  Tsvi Kuflik,et al.  PRAW - A PRivAcy model for the Web , 2005, J. Assoc. Inf. Sci. Technol..

[19]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[20]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[21]  Ernesto Damiani,et al.  Location Privacy Protection Through Obfuscation-Based Techniques , 2007, DBSec.

[22]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[23]  Yuval Elovici,et al.  Enhancing customer privacy while searching for products and services on the world wide web , 2005, Internet Res..

[24]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[25]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[26]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[27]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[29]  D E Burmaster,et al.  A Trivariate Distribution for the Height, Weight, and Fat of Adult Men * , 1998, Risk analysis : an official publication of the Society for Risk Analysis.

[30]  Jordi Forné,et al.  Private location-based information retrieval through user collaboration , 2010, Comput. Commun..

[31]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[32]  Christopher Soghoian,et al.  The Problem of Anonymous Vanity Searches , 2007 .

[33]  Jens-Rainer Ohm,et al.  Three-dimensional subband coding with motion compensation , 1994, IEEE Trans. Image Process..

[34]  Sushil Jajodia,et al.  On the Impact of User Movement Simulations in the Evaluation of LBS Privacy- Preserving Techniques , 2008, PiLBA.

[35]  M. Worboys,et al.  A formal approach to imperfection in geographic information , 2001 .

[36]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[37]  Hua Wang,et al.  Enhanced P-Sensitive K-Anonymity Models for Privacy Preserving Data Publishing , 2008, Trans. Data Priv..

[38]  Anco Hundepool The CASC Project , 2002, Inference Control in Statistical Databases.

[39]  Jordi Forné,et al.  Optimized Query Forgery for Private Information Retrieval , 2010, IEEE Transactions on Information Theory.

[40]  David Rebollo Monedero,et al.  A Collaborative Protocol for Private Retrieval of Location-Based Information , 2009 .

[41]  Argyris Kalogeratos,et al.  Document clustering using synthetic cluster prototypes , 2011, Data Knowl. Eng..

[42]  Hui Xiong,et al.  Enhancing Security and Privacy in Traffic-Monitoring Systems , 2006, IEEE Pervasive Computing.

[43]  Vivek K. Goyal,et al.  Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[44]  Ling Liu,et al.  Supporting anonymous location queries in mobile environments with privacygrid , 2008, WWW.

[45]  Marco Gruteser,et al.  USENIX Association , 1992 .

[46]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[47]  Giuseppe Bianchi,et al.  The SPARTA pseudonym and authorization system , 2008, Sci. Comput. Program..

[48]  Rafail Ostrovsky,et al.  A Survey of Single-Database Private Information Retrieval: Techniques and Applications , 2007, Public Key Cryptography.

[49]  David Rebollo Monedero,et al.  How do we measure privacy , 2010 .

[50]  Jordi Forné,et al.  Private Location-Based Information Retrieval via k-Anonymous Clustering , 2010 .

[51]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[52]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[53]  Ling Liu,et al.  A Customizable k-Anonymity Model for Protecting Location Privacy , 2004 .

[54]  Josep Domingo-Ferrer,et al.  From t-Closeness to PRAM and Noise Addition Via Information Theory , 2008, Privacy in Statistical Databases.

[55]  Yu Hui-qun,et al.  An Improved V-MDAV Algorithm for l-Diversity , 2008, 2008 International Symposiums on Information Processing.

[56]  Robert M. Gray,et al.  Global convergence and empirical consistency of the generalized Lloyd algorithm , 1986, IEEE Trans. Inf. Theory.

[57]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[58]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[59]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[60]  David Eppstein,et al.  Planar Voronoi Diagrams for Sums of Convex Functions, Smoothed Distance and Dilation , 2010, 2010 International Symposium on Voronoi Diagrams in Science and Engineering.

[61]  Nobuo Yamashita,et al.  On a Global Complexity Bound of the Levenberg-Marquardt Method , 2010, J. Optim. Theory Appl..

[62]  W. Fischer,et al.  Sphere Packings, Lattices and Groups , 1990 .

[63]  Yu Zhang,et al.  Preserving User Location Privacy in Mobile Data Management Infrastructures , 2006, Privacy Enhancing Technologies.

[64]  Ian R. Kerr,et al.  Lessons from the Identity Trail: Anonymity, Privacy and Identity in a Networked Society , 2009 .

[65]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[66]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[67]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[68]  Thomas S. Huang,et al.  Image processing , 1971 .

[69]  Lars Kulik,et al.  A Formal Model of Obfuscation and Negotiation for Location Privacy , 2005, Pervasive.

[70]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[71]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[72]  José M. Troya,et al.  Specification of a framework for the anonymous use of privileges , 2006, Telematics Informatics.

[73]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[74]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[75]  Hua Lu,et al.  SpaceTwist: Managing the Trade-Offs Among Location Privacy, Query Performance, and Query Accuracy in Mobile Services , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[76]  Mohamed F. Mokbel,et al.  Towards Privacy-Aware Location-Based Database Servers , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[77]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[78]  Tetsuji Satoh,et al.  Protection of Location Privacy using Dummies for Location-based Services , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[79]  Agusti Solanas,et al.  A TTP-free protocol for location privacy in location-based services , 2008, Comput. Commun..

[80]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[81]  Josep Domingo-Ferrer,et al.  Microaggregation for Database and Location Privacy , 2006, NGITS.

[82]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[83]  Rafail Ostrovsky,et al.  A Survey of Single Database PIR: Techniques and Applications , 2007, IACR Cryptol. ePrint Arch..

[84]  Tsvi Kuflik,et al.  Privacy Preservation Improvement by Learning Optimal Profile Generation Rate , 2003, User Modeling.

[85]  Josep Domingo-Ferrer,et al.  A Critique of k-Anonymity and Some of Its Enhancements , 2008, 2008 Third International Conference on Availability, Reliability and Security.