Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction

Abstract The process of frequent itemset mining (FIM) within large-scale databases plays a significant part in many knowledge discovery tasks, where, however, potential privacy breaches are possible. Privacy preserving frequent itemset mining (PPFIM) has thus drawn increasing attention recently, where the ultimate goal is to hide sensitive frequent itemsets (SFIs) so as to leave no confidential knowledge uncovered in the resulting database. Nevertheless, the vast majority of the proposed methods for PPFIM were merely based on database perturbation, which may result in a significant loss of data utility in order to conceal all SFIs. To alleviate this issue, this paper proposes a database reconstruction-based algorithm for PPFIM (DR-PPFIM) that can not only achieve a high degree of privacy but also afford a reasonable data utility. In DR-PPFIM, all SFIs with related frequent itemsets are first identified for removing in the pre-sanitize process by implementing a devised sanitize method. With the remained frequent itemsets, a novel database reconstruction scheme is proposed to reconstruct an appropriate database, where the concepts of inverse frequent itemset mining (IFIM) and database extension are efficiently integrated. In this way, all SFIs are able to be hidden under the same mining threshold while maximizing the data utility of the synthetic database as much as possible. Moreover, we also develop a further hiding strategy in DR-PPFIM to further decrease the significance of SFIs with the purpose of reducing the risk of disclosing confidential knowledge. Extensive comparative experiments are conducted on real databases to demonstrate the superiority of DR-PPFIM in terms of maximizing the utility of data and resisting potential threats.

[1]  Kim-Kwang Raymond Choo,et al.  Privacy-Preserving-Outsourced Association Rule Mining on Vertically Partitioned Databases , 2016, IEEE Transactions on Information Forensics and Security.

[2]  Wenliang Du,et al.  K-anonymous association rule hiding , 2010, ASIACCS '10.

[3]  Xiang Cheng,et al.  Differentially private maximal frequent sequence mining , 2015, Comput. Secur..

[4]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[6]  George V. Moustakides,et al.  A Max-Min Approach for Hiding Frequent Itemsets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[7]  Stan Matwin,et al.  Classifying data from protected statistical datasets , 2010, Comput. Secur..

[8]  Basit Shafiq,et al.  A Random Decision Tree Framework for Privacy-Preserving Data Mining , 2014, IEEE Transactions on Dependable and Secure Computing.

[9]  Osmar R. Zaïane,et al.  A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration , 2007, Comput. Secur..

[10]  Tolga Ayav,et al.  Coefficient-based exact approach for frequent itemset hiding , 2014 .

[11]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Tzung-Pei Hong,et al.  The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion , 2014, Applied Intelligence.

[13]  Simon Fong,et al.  Hiding sensitive association rules using central tendency , 2010, 2010 6th International Conference on Advanced Information Management and Service (IMS).

[14]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[15]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007 .

[16]  Benjamin C. M. Fung,et al.  Mining High Utility Patterns in One Phase without Generating Candidates , 2016, IEEE Transactions on Knowledge and Data Engineering.

[17]  Philip S. Yu,et al.  A border-based approach for hiding sensitive frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[19]  A. Tamilarasi,et al.  Tabu Search based Association Rule Hiding , 2011 .

[20]  Yongge Wang,et al.  Approximate inverse frequent itemset mining: privacy, complexity, and approximation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Osmar R. Zaïane,et al.  A unified framework for protecting sensitive association rules in business collaboration , 2006, Int. J. Bus. Intell. Data Min..

[22]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[23]  Osmar R. Zaïane,et al.  Algorithms for balancing privacy and knowledge discovery in association rule mining , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[24]  Yuhong Guo Reconstruction-Based Association Rule Hiding , 2007 .

[25]  Justin Zhijun Zhan,et al.  Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining , 2016, Eng. Appl. Artif. Intell..

[26]  Xiang Cheng,et al.  DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting , 2015, Comput. Secur..

[27]  Sumit Sarkar,et al.  Minimizing Information Loss and Preserving Privacy , 2007, Manag. Sci..

[28]  Aris Gkoulalas-Divanis,et al.  Exact Knowledge Hiding through Database Extension , 2009, IEEE Transactions on Knowledge and Data Engineering.

[29]  Azuraliza Abu Bakar,et al.  Mining positive and Negative Association Rules from interesting frequent and infrequent itemsets , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[30]  Unil Yun,et al.  A fast perturbation algorithm using tree structure for privacy preserving utility mining , 2015, Expert Syst. Appl..

[31]  Ying Wu,et al.  Privacy Aware Market Basket Data Set Generation: A Feasible Approach for Inverse Frequent Set Mining , 2005, SDM.

[32]  Pankaj Chandre,et al.  Association rule mining methods for applying encryption techniques in transaction dataset , 2016, 2016 International Conference on Computer Communication and Informatics (ICCCI).

[33]  Vasileios Kagklis,et al.  A transversal hypergraph approach for the frequent itemset hiding problem , 2015, Knowledge and Information Systems.

[34]  Aris Gkoulalas-Divanis,et al.  An integer programming approach for frequent itemset hiding , 2006, CIKM '06.

[35]  Sumit Sarkar,et al.  Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns , 2005, Inf. Syst. Res..

[36]  Minghua Chen,et al.  Enabling Multilevel Trust in Privacy Preserving Data Mining , 2011, IEEE Transactions on Knowledge and Data Engineering.

[37]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[38]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[39]  Xiang Cheng,et al.  Differentially Private Frequent Itemset Mining via Transaction Splitting , 2015, IEEE Trans. Knowl. Data Eng..

[40]  Tzung-Pei Hong,et al.  A sanitization approach for hiding sensitive itemsets based on particle swarm optimization , 2016, Eng. Appl. Artif. Intell..

[41]  Dhyanendra Jain Hiding Sensitive Association Rules without Altering the Support of Sensitive Item(s) , 2012, ArXiv.

[42]  Jifu Zhang,et al.  FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[43]  Ming Li,et al.  Toward Practical Privacy-Preserving Frequent Itemset Mining on Encrypted Cloud Data , 2020, IEEE Transactions on Cloud Computing.

[44]  Giannis Tzimas,et al.  An Integer Linear Programming Scheme to Sanitize Sensitive Frequent Itemsets , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[45]  Cheng Wang Xie,et al.  Privacy Preserving in Association Rules Mining with VPA Algorithm , 2012 .

[46]  Amish Desai,et al.  Privacy preserving heuristic approach for association rule mining in distributed database , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[47]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.