Study on Record Linkage of Anonymizied Data

SUMMARY Data anonymization is required before a big-data business can run effectively without compromising the privacy of personal information it uses. It is not trivial to choose the best algorithm to anonymize some given data securely for a given purpose. In accurately assessing the risk of data being compromised, there needs to be a balance between utility and security. Therefore, using common pseudo microdata, we propose a competitionforthebestanonymizationandre-identificationalgorithm. The paper reported the result of the competition and the analysis on the effective of anonymization technique. The competition result reveals that there is a tradeoff between utility and security, and 20.9% records were re-identified in average. key words: data privacy, anonymization, re-identification risk, big data

[1]  Rinku Dewri,et al.  Linking Health Records for Federated Query Processing , 2016, Proc. Priv. Enhancing Technol..

[2]  Bin Liu,et al.  You Are Who You Know and How You Behave: Attribute Inference Attacks via Users' Social Friends and Behaviors , 2016, USENIX Security Symposium.

[3]  Jun Sakuma,et al.  Ice and Fire: Quantifying the Risk of Re-identification and Utility in Data Anonymization , 2016, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA).

[4]  Josep Domingo-Ferrer,et al.  Disclosure risk assessment via record linkage by a maximum-knowledge attacker , 2015, 2015 13th Annual Conference on Privacy, Security and Trust (PST).

[5]  Josep Domingo-Ferrer,et al.  Privacy and Data Protection by Design - from policy to engineering , 2014, ArXiv.

[6]  Thomas Cerqueus,et al.  A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners , 2014, Trans. Data Priv..

[7]  Spiros Skiadopoulos,et al.  Apriori-based algorithms for km-anonymizing trajectory data , 2014, Trans. Data Priv..

[8]  Khaled El Emam,et al.  Anonymizing Health Data: Case Studies and Methods to Get You Started , 2013 .

[9]  Hiromasa Horiguchi,et al.  Outcomes After Laparoscopic or Open Distal Gastrectomy for Early-Stage Gastric Cancer: A Propensity-Matched Analysis , 2013, Annals of surgery.

[10]  Adam Thierer Public Interest Comment on Federal Trade Commission Report, Protecting Consumer Privacy in an Era of Rapid Change , 2011 .

[11]  Avi Goldfarb,et al.  Comments on 'A Preliminary FTC Staff Report on Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers' , 2011 .

[12]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[13]  Vincent Danos,et al.  Approximating Markov Processes by Averaging , 2009, JACM.

[14]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[17]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[18]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[20]  C. Skinner,et al.  A measure of disclosure risk for microdata , 2002 .

[21]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[23]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[24]  P. Samarati,et al.  k-Anonymity , 2007, Secure Data Management in Decentralized Systems.

[25]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .