A clustering approach to anonymize locations during dataset de-identification

Companies increasingly rely on massive amounts of data for strategic decision making purposes. In order to optimize business intelligence, companies often try to enrich their models with datasets acquired from third parties. Datasets containing sensitive attributes must be anonymized before release. For large datasets containing microdata, an often applied anonymization technique is data generalization with the goal of achieving privacy metrics such as k-anonymity. Location is an often recurring yet strategic attribute in many use cases. Multiple strategies can be employed to obfuscate precise coordinates. For example, the most significant digits can be dropped or their value can be replaced by a ZIP code. While these methods might be useful in some applications, these approaches often result in too much information loss, undermining strategic decision making. This paper proposes a novel approach to anonymize location by means of clustering. Its feasibility is evaluated and compared to traditional techniques.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Thomas Cerqueus,et al.  Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization , 2015, ArXiv.

[4]  Thomas Cerqueus,et al.  A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners , 2014, Trans. Data Priv..

[5]  Panos Kalnis,et al.  Providing K-Anonymity in location based services , 2010, SKDD.

[6]  Valli Kumari Vatsavayi,et al.  An Efficient and Dynamic Concept Hierarchy Generation for Data Anonymization , 2013, ICDCIT.

[7]  Ling Liu,et al.  Location Privacy in Mobile Systems: A Personalized Anonymization Model , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[8]  Jun-Lin Lin,et al.  An efficient clustering method for k-anonymization , 2008, PAIS '08.

[9]  Wajih Ul Hassan,et al.  Analysis of Privacy Protections in Fitness Tracking Social Networks -or- You can run, but can you hide? , 2018, USENIX Security Symposium.

[10]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[11]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[12]  Alina Campan,et al.  On-the-Fly Generalization Hierarchies for Numerical Attributes Revisited , 2011, Secure Data Management.

[13]  Christina Thorpe,et al.  Enhancing the Utility of Anonymized Data by Improving the Quality of Generalization Hierarchies , 2017, Trans. Data Priv..

[14]  Quan Qian,et al.  Clustering Based K-anonymity Algorithm for Privacy Preservation , 2017, Int. J. Netw. Secur..

[15]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[16]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[18]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[19]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[20]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[21]  Juliana Freire,et al.  Anonymizing NYC Taxi Data: Does It Matter? , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[22]  Urs Hengartner,et al.  A distributed k-anonymity protocol for location privacy , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[23]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.