Sensitivity-Based Anonymization of Big Data

Data Analytics is widely used as a means of extracting useful information from available data. It is only natural that it is increasingly adapted for processing big data. The rapidly growing demand for big data analytics has several undesirable side-effects. Perhaps, the most significant of those relates to increased risks for data disclosure and privacy violations. Data anonymization can provide promising solutions for minimizing such risks. In this paper, we discuss some of the specific requirements of the anonymization process when dealing with big data. We show that in general, information loss is the result of avoidable generalization of similar or equivalent data. Using these analyses, we propose a novel framework for data anonymization, which expands the k-anonymity properties and concepts and takes the data class values and the sensitivity of data into account. As such, the proposed process can utilize a bottom-up approach, in contrast to most other anonymization methods. The top-down approaches usually generalize all records, the equivalent and the non-equivalent ones. Ours is more methodical, as it avoids the generalization of the equivalent records. With the inclusion of sensitivity levels, we demonstrate that our framework can reduce the iteration steps and the time required to finalize the anonymization, and therefore enhance the overall efficiency of the process.

[1]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  M. Berthold,et al.  International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , 1998 .

[3]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[4]  Surajit Chaudhuri,et al.  What next?: a half-dozen data management research goals for big data and the cloud , 2012, PODS.

[5]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[6]  Alvaro A. Cárdenas,et al.  Big Data Analytics for Security , 2013, IEEE Security & Privacy.

[7]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[8]  Alex Holmes Hadoop in Practice , 2012 .

[9]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[10]  Sameesha Vs A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud , 2017 .

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Shui Yu,et al.  Big Data Concepts, Theories, and Applications , 2016, Springer International Publishing.

[13]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[15]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[16]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[17]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  James E. Lewis,et al.  Web Single Sign-On Authentication using SAML , 2009, ArXiv.

[21]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[22]  Hiroyuki Sato,et al.  A Solution for Privacy Protection in MapReduce , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[23]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[24]  Jinjun Chen,et al.  Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[25]  Jinjun Chen,et al.  A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud , 2014, J. Comput. Syst. Sci..

[26]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[27]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28]  Karsten P. Ulland,et al.  Vii. References , 2022 .

[29]  KarguptaHillol,et al.  Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining , 2006 .