Top-k closed co-occurrence patterns mining with differential privacy over multiple streams

Abstract The frequent pattern mining over data streams is a very important problem for many applications. However, many researches investigate a single stream in which every transaction is independent and it is not considered that some transactions are generated by the same individual. Some real-world applications involve multiple streams that continuously generate objects, and interesting observations are the objects appearing in many streams, such as emerging topic discovery, e-commerce, web usage pattern mining and location-based services. In this paper, we analyze the privacy problems in mining top- k closed co-occurrence patterns over multiple streams caused by single release of a window and continuous releases in successive windows. To prevent privacy leakage, we propose a differentially private top- k closed co-occurrence patterns mining algorithm across multiple streams with exponential mechanism and Laplace mechanism. The algorithm consists of dissimilarity calculation phase and differentially private mining phase, where differentially private mining phase includes adjusting CP-Graph with splitting transaction, perturbing CP-Graph to obtain the top- k closed co-occurrence patterns candidate set and adding noise to the supports of patterns. Finally, we prove our algorithm satisfies differential privacy and experiment results show the utility and efficiency of our algorithm.

[1]  Hyunbum Kim,et al.  UDiPP: A Framework for Differential Privacy Preserving Movements of Unmanned Aerial Vehicles in Smart Cities , 2019, IEEE Transactions on Vehicular Technology.

[2]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[3]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[4]  Stefan Decker,et al.  Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach , 2018, Inf. Sci..

[5]  Pauray S. M. Tsai,et al.  Mining top-k frequent closed itemsets over data streams using the sliding window model , 2010, Expert Syst. Appl..

[6]  Ricard Gavaldà,et al.  An efficient closed frequent itemset miner for the MOA stream mining system , 2015, AI Commun..

[7]  Ming-Syan Chen,et al.  Mining top-k frequent patterns in the presence of the memory constraint , 2008, The VLDB Journal.

[8]  Dansong Cheng,et al.  A differential privacy noise dynamic allocation algorithm for big multimedia data , 2018, Multimedia Tools and Applications.

[9]  Xiang Cheng,et al.  Differentially Private Frequent Itemset Mining via Transaction Splitting , 2015, IEEE Trans. Knowl. Data Eng..

[10]  Tianqing Zhu,et al.  Differentially private model publishing in cyber physical systems , 2020, Future Gener. Comput. Syst..

[11]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[12]  Jun Wang,et al.  Improved Kalman filter based differentially private streaming data release in cognitive computing , 2019, Future Gener. Comput. Syst..

[13]  Takahiro Hara,et al.  Mining Top-k Co-Occurrence Patterns across Multiple Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xiang Cheng,et al.  DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting , 2015, Comput. Secur..

[15]  Yang Cao,et al.  LocLok: Location Cloaking with Differential Privacy via Hidden Markov Model , 2017, Proc. VLDB Endow..

[16]  Shekhar Verma,et al.  Differentially Private Location Privacy Preservation in Wireless Sensor Networks , 2018, Wireless Personal Communications.

[17]  Jaein Kim,et al.  Real-time stream data mining based on CanTree and Gtree , 2016, Inf. Sci..

[18]  Hui Chen Mining top-k frequent patterns over data streams sliding window , 2013, Journal of Intelligent Information Systems.

[19]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[20]  Yongsub Lim,et al.  Time-weighted counting for recently frequent pattern mining in data streams , 2017, Knowledge and Information Systems.