Collaborative Graph-Based Mechanism for Distributed Big Data Leakage Prevention

Data leakage is a growing insider threat to data owners. Several studies have been done on data leakage prevention (DLP). In the era of big data, massive data has been generated constantly by various institutions. In many applications, multiple institutions may be interested in sharing data with each other to extract more value, without leaking private data. It is a new challenge for DLP in the big data scenario to train a global detection model over distributed big data sets without breaking the data privacy of each party. Moreover, as the forms of data become much complicated, the model should also be capable of tolerating data transformation. In this paper, we propose a Collaborative graph-based mechanism for Distributed big data Leakage Detection (CoDLD). CoDLD addresses the collaborative DLP problem in three aspects. First, it transfers the text detection problem into graph space. The local weighted graphs of data owners are iteratively constructed in turn. Second, it protects the privacy of each data owner by using graph masking on local weighted graphs. Third, it applies partition-based method on the graph to perform accurate matching detection efficiently. Experimental results show that our method can perform collaborative data leak detection over distributed data with high accuracy and efficiency.

[1]  Yuval Elovici,et al.  CoBAn: A context based model for data leakage prevention , 2014, Inf. Sci..

[2]  Yin Yang,et al.  Generating Synthetic Decentralized Social Graphs with Local Differential Privacy , 2017, CCS.

[3]  Vallipuram Muthukkumarasamy,et al.  Detecting Data Semantic: A Data Leakage Prevention Approach , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[4]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Vallipuram Muthukkumarasamy,et al.  A survey on data leakage prevention systems , 2016, J. Netw. Comput. Appl..

[6]  Rob Johnson,et al.  Text Classification for Data Loss Prevention , 2011, PETS.

[7]  Jing Zhang,et al.  Fast Detection of Transformed Data Leaks , 2016, IEEE Transactions on Information Forensics and Security.

[8]  Vallipuram Muthukkumarasamy,et al.  Word N-Gram Based Classification for Data Leakage Prevention , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[9]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[10]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[11]  M. Preethi PRIVACY-PRESERVING DETECTION OF SENSITIVE DATA EXPOSURE , 2016 .

[12]  Maode Ma,et al.  A Novel Mechanism for Fast Detection of Transformed Data Leakage , 2018, IEEE Access.

[13]  Vallipuram Muthukkumarasamy,et al.  Adaptable N-gram classification model for data leakage prevention , 2013, 2013, 7th International Conference on Signal Processing and Communication Systems (ICSPCS).

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  George A. Vouros,et al.  Summarization system evaluation revisited: N-gram graphs , 2008, TSLP.

[16]  Maode Ma,et al.  A Weighted Context Graph Model for Fast Data Leak Detection , 2018, 2018 IEEE International Conference on Communications (ICC).