C-mixture and multi-constraints based genetic algorithm for collaborative data publishing

Abstract Due to increasing need of using distributed databases, high demand presents on sharing data to easily update and access the useful information without any interruption. The sharing of distributed databases causes a serious issue of securing information since the databases consist of sensitive personal information. To preserve the sensitive information and at the same time, releasing the useful information, a significant effort is made by the researchers under privacy preserving data publishing that have been receiving considerable attention in recent years. In this work, a new privacy measure, called c-mixture is introduced to maintain the privacy constraint without affecting utility of the database. In order to apply the proposed privacy measure to privacy preserving data publishing, a new algorithm called, CPGEN is developed using genetic algorithm and multi-objective constraints. The proposed multi-objective optimization considered the multiple privacy constraints along with the utility measurement to measure the importance. Also, the proposed CPGEN is adapted to handle the cold-start problem which commonly happened in distributed databases. The proposed algorithm is experimented with adult dataset and quantitative performance is analyzed using generalized information loss and average equivalence class size metric. From the experimentation, we proved that the proposed algorithm maintained the privacy and utility as compared with the existing algorithm.

[1]  Ninghui Li,et al.  Slicing: A New Approach for Privacy Preserving Data Publishing , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Thomas Cerqueus,et al.  A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners , 2014, Trans. Data Priv..

[3]  Raymond Chi-Wing Wong,et al.  FF-Anonymity: When Quasi-identifiers Are Missing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[5]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[6]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Elisa Bertino,et al.  Efficient systematic clustering method for k-anonymization , 2011, Acta Informatica.

[8]  Elisa Bertino,et al.  A Supermodularity-Based Differential Privacy Preserving Algorithm for Data Anonymization , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Benjamin C. M. Fung,et al.  m-Privacy for collaborative data publishing , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[10]  Vaidy S. Sunderam,et al.  Secure multiparty aggregation with differential privacy: a comparative study , 2013, EDBT '13.

[11]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  J. McCall,et al.  Genetic algorithms for modelling and optimisation , 2005 .

[13]  Hua Wang,et al.  A family of enhanced (L, alpha)-diversity models for privacy preserving data publishing , 2011, Future Gener. Comput. Syst..

[14]  Benjamin C. M. Fung,et al.  Anonymizing trajectory data for passenger flow analysis , 2014 .

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differentially Private Data Releases via k-Anonymity , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[17]  Jian Pei,et al.  Editorial [State of the Transactions] , 2014, IEEE Trans. Knowl. Data Eng..

[18]  Raymond Chi-Wing Wong,et al.  Small sum privacy and large sum utility in data publishing , 2014, J. Biomed. Informatics.

[19]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[20]  Benjamin C. M. Fung,et al.  Quantifying the costs and benefits of privacy-preserving health data publishing , 2014, J. Biomed. Informatics.

[21]  Shubham Joshi,et al.  Enhanced M-Privacy for Collaborative DataPublishing , 2014 .

[22]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[24]  Ke Wang,et al.  Small domain randomization , 2010, Proc. VLDB Endow..