Towards optimal sensitivity-based anonymization for big data

Datasets containing private and sensitive information are useful for data analytics. Data owners cautiously release such sensitive data using privacy-preserving publishing techniques. Personal re-identification possibility is much larger than ever before. For instance, social media has dramatically increased the exposure to privacy violation. One well-known technique of k-anonymity proposes a protection approach against privacy exposure. K-anonymity tends to find k equivalent number of data records. The chosen attributes are known as Quasiidentifiers. This approach may reduce the personal reidentification. However, this may lessen the usefulness of information gained. The value of k should be carefully determined, to compromise both security and information gained. Unfortunately, there is no any standard procedure to define the value of k. The problem of the optimal k-anonymization is NP-hard. In this paper, we propose a greedy-based heuristic approach that provides an optimal value for k. The approach evaluates the empirical risk concerning our Sensitivity-Based Anonymization method. Our approach is derived from the finegrained access and business role anonymization for big data, which forms our framework.

[1]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[2]  Ramesh Hariharan,et al.  Enhancing Privacy Preservation in Data Mining using Cluster based Greedy Method in Hierarchical Approach , 2016 .

[3]  Mohammed Al-Zobbi,et al.  Sensitivity-Based Anonymization of Big Data , 2016, 2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops).

[4]  R. Motwani,et al.  Efficient Algorithms for Masking and Finding Quasi-Identifiers , 2007 .

[5]  Matthew Smith,et al.  Big data privacy issues in public social media , 2012, 2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST).

[6]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Justin Reich,et al.  Privacy, Anonymity, and Big Data in the Social Sciences , 2014 .

[8]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[9]  Shui Yu,et al.  Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data , 2016, IEEE Access.

[10]  Matthew Morgenstern,et al.  Security and inference in multilevel database and knowledge-base systems , 1987, SIGMOD '87.

[11]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Gultekin Özsoyoglu,et al.  Controlling FD and MVD Inferences in Multilevel Relational Database Systems , 1991, IEEE Trans. Knowl. Data Eng..

[13]  Joseph K. Liu,et al.  Toward efficient and privacy-preserving computing in big data era , 2014, IEEE Network.

[14]  Toru Nakamura,et al.  k-anonymity: Risks and the Reality , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.