Research on improved privacy publishing algorithm based on set cover

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

[1]  Huiqun Yu,et al.  An improved l-diversity model for numerical sensitive attributes , 2008, 2008 Third International Conference on Communications and Networking in China.

[2]  Claire Mathieu,et al.  A Quasipolynomial Time Approximation Scheme for Euclidean Capacitated Vehicle Routing , 2008, Algorithmica.

[3]  Joseph Naor,et al.  A Unified Continuous Greedy Algorithm for Submodular Maximization , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[4]  Albert Bokma,et al.  A K-anonymity Based Semantic Model For Protecting Personal Information and Privacy , 2009, 2009 IEEE International Advance Computing Conference.

[6]  T. C. Edwin Cheng,et al.  An alternative approach for proving the NP-hardness of optimization problems , 2016, Eur. J. Oper. Res..

[7]  Ashish Sabharwal,et al.  Leveraging Belief Propagation, Backtrack Search, and Statistics for Model Counting , 2008, ISAIM.

[8]  Jing Ma,et al.  Data Security and Privacy Information Challenges in Cloud Computing , 2016, 2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS).

[9]  Jun Ohkubo,et al.  Basics of Counting Statistics , 2013, IEICE Trans. Commun..

[10]  Ninghui Li,et al.  Understanding the Sparse Vector Technique for Differential Privacy , 2016, Proc. VLDB Endow..

[11]  Hans-Peter Kriegel,et al.  Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases , 2010, SSDBM.

[12]  Guillaume Sagnol,et al.  Approximation of a maximum-submodular-coverage problem involving spectral functions, with application to experimental designs , 2010, Discret. Appl. Math..

[13]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[16]  Mahmoud A. Barghash,et al.  An improved hybrid algorithm for the set covering problem , 2015, Comput. Ind. Eng..

[17]  Maurizio Atzori,et al.  Weak k-Anonymity: A Low-Distortion Model for Protecting Privacy , 2006, ISC.

[18]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[19]  Adi Rosén,et al.  Semi-Streaming Set Cover , 2014, ACM Trans. Algorithms.

[20]  Reynold Cheng,et al.  Efficient Mining of Frequent Item Sets on Large Uncertain Databases , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21]  Divesh Srivastava,et al.  Size-Constrained Weighted Set Cover , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[22]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[23]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Songjian Lu,et al.  An exact algorithm for finding cancer driver somatic genome alterations: the weighted mutually exclusive maximum set cover problem , 2016, Algorithms for Molecular Biology.

[25]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[26]  Peiyi Tang,et al.  Mining probabilistic frequent closed itemsets in uncertain databases , 2011, ACM-SE '11.

[27]  Zhengrong Liang,et al.  A novel colon wall flattening model for computed tomographic colonography: method and validation , 2015, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[28]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[29]  K JohnSingh,et al.  Privacy preserving secret key extraction protocol for multi-authority attribute-based encryption techniques in cloud computing , 2018, Int. J. Embed. Syst..

[30]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[31]  Panos M. Pardalos,et al.  Greedy approximations for minimum submodular cover with submodular cost , 2010, Comput. Optim. Appl..

[32]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..