A Novel Mechanism for Fast Detection of Transformed Data Leakage

Data leakage is a growing insider threat in information security among organizations and individuals. A series of methods has been developed to address the problem of data leakage prevention (DLP). However, large amounts of unstructured data need to be tested in the big data era. As the volume of data grows dramatically and the forms of data become much complicated, it is a new challenge for DLP to deal with large amounts of transformed data. We propose an adaptive weighted graph walk model to solve this problem by mapping it to the dimension of weighted graphs. Our approach solves this problem in three steps. First, the adaptive weighted graphs are built to quantify the sensitivity of the tested data based on its context. Then, the improved label propagation is used to enhance the scalability for fresh data. Finally, a low-complexity score walk algorithm is proposed to determine the ultimate sensitivity. Experimental results show that the proposed method can detect leaks of transformed or fresh data fast and efficiently.

[1]  Jun Huang,et al.  Game-Theoretic Power Control Mechanisms for Device-to-Device Communications Underlaying Cellular System , 2018, IEEE Transactions on Vehicular Technology.

[2]  Vallipuram Muthukkumarasamy,et al.  Detecting Data Semantic: A Data Leakage Prevention Approach , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[3]  Xiang Li,et al.  DTD: A Novel Double-Track Approach to Clone Detection for RFID-Enabled Supply Chains , 2017, IEEE Transactions on Emerging Topics in Computing.

[4]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[5]  Elisa Bertino,et al.  Privacy-Preserving Detection of Sensitive Data Exposure , 2015, IEEE Transactions on Information Forensics and Security.

[6]  Shuo Xu,et al.  Bayesian Naïve Bayes classifiers to text classification , 2018, J. Inf. Sci..

[7]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vallipuram Muthukkumarasamy,et al.  Adaptable N-gram classification model for data leakage prevention , 2013, 2013, 7th International Conference on Signal Processing and Communication Systems (ICSPCS).

[9]  Jing Zhang,et al.  Fast Detection of Transformed Data Leaks , 2016, IEEE Transactions on Information Forensics and Security.

[10]  Jun Huang,et al.  Competitions Among Service Providers in Cloud Computing: A New Economic Model , 2018, IEEE Transactions on Network and Service Management.

[11]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Rob Johnson,et al.  Text Classification for Data Loss Prevention , 2011, PETS.

[13]  Yuval Elovici,et al.  CoBAn: A context based model for data leakage prevention , 2014, Inf. Sci..

[14]  Vallipuram Muthukkumarasamy,et al.  A survey on data leakage prevention systems , 2016, J. Netw. Comput. Appl..

[15]  Vallipuram Muthukkumarasamy,et al.  Word N-Gram Based Classification for Data Leakage Prevention , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  George A. Vouros,et al.  Summarization system evaluation revisited: N-gram graphs , 2008, TSLP.

[17]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[18]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[19]  Maode Ma,et al.  A Weighted Context Graph Model for Fast Data Leak Detection , 2018, 2018 IEEE International Conference on Communications (ICC).