Data security rules/regulations based classification of file data using TsF-kNN algorithm

Personal and organizational data are getting larger in volume with respect to time. Due to the importance of data for organisations, effective and efficient management and categorization of data need a special focus. Understanding and applying data security policies to the appropriate data types therefore is one of the core concerns in large organisations such as cloud service providers. With data classification, the identification of security requirements for the data can be accomplished without manual intervention where the encryption process is applied only to the confidential data thus saving encryption time, decryption time, storage and processing power. The proposed data classification approach is to reduce the network traffic, the additional data movement, the overload, and the storage place for confidential data can be decided where security requirements of the confidential data are fulfilled. In this paper, an intelligent data classification approach is presented for predicting the confidentiality/sensitivity level of the data in a file based on the corporate objective and government policies/rules. An enhanced version of the k-NN algorithm is also proposed to reduce the computational complexity of the traditional k-NN algorithm at data classification phase. The proposed algorithm is called Training dataset Filtration-kNN (TsF-kNN). The experimental results show that data in a file can be classified into confidential and non-confidential classes and TsF-kNN algorithm has better performance against the traditional k-NN and Naïve Bayes algorithm.

[1]  Rachel L. Crowgey,et al.  The state-of-the-art of mobile payment architecture and emerging issues , 2006 .

[2]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[3]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[4]  Michael Gertz,et al.  Handbook of Database Security - Applications and Trends , 2007, Handbook of Database Security.

[5]  Tsuyoshi Takagi,et al.  Secure k-NN computation on encrypted cloud data without sharing key with query users , 2013, Cloud Computing '13.

[6]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[7]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[8]  Qun Liu,et al.  Dependency Parsing and Projection Based on Word-Pair Classification , 2010, ACL.

[9]  Michael Gertz,et al.  DEMIDS: A Misuse Detection System for Database Systems , 2000, IICIS.

[10]  Venu Govindaraju,et al.  Improved k-nearest neighbor classification , 2002, Pattern Recognit..

[11]  Edwin R. Hancock,et al.  Gender discriminating models from facial surface normals , 2011, Pattern Recognit..

[12]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[13]  Adrian Spalka,et al.  A Comprehensive Approach to Anomaly Detection in Relational Databases , 2005, DBSec.

[14]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[15]  Xiangji Huang,et al.  Finding and Analyzing Database User Sessions , 2005, DASFAA.

[16]  Keke Chen,et al.  RASP-Boost: Confidential Boosting-Model Learning with Perturbed Data in the Cloud , 2018, IEEE Transactions on Cloud Computing.

[17]  Shou-De Lin,et al.  A Ranking-based KNN Approach for Multi-Label Classification , 2012, ACML.

[18]  Elisa Bertino,et al.  Detecting anomalous access patterns in relational databases , 2008, The VLDB Journal.

[19]  Rajarshi Shahu,et al.  K-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data , 2016 .

[20]  Q. Henry Wu,et al.  A class boundary preserving algorithm for data condensation , 2011, Pattern Recognit..

[21]  Keke Chen,et al.  PerturBoost: Practical Confidential Classifier Learning in the Cloud , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  Oliver Kopp,et al.  Cloud Data Patterns for Confidentiality , 2012, CLOSER.

[23]  Wei Zhang,et al.  Encrypted Association Rule Mining for Outsourced Data Mining , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[24]  Brian Hayes,et al.  What Is Cloud Computing? , 2019, Cloud Technologies.

[25]  Ah-Hwee Tan,et al.  A Comparative Study on Chinese Text Categorization Methods , 2000, PRICAI Workshop on Text and Web Mining.

[26]  Michael Gertz,et al.  Security Re-engineering for Databases: Concepts and Techniques , 2008, Handbook of Database Security.

[27]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[28]  Hilary H. Hosmer,et al.  Using fuzzy logic to represent security policies in the multipolicy paradigm , 1992, SGSC.

[29]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[30]  Fabrizio Angiulli,et al.  Fast condensed nearest neighbor rule , 2005, ICML.

[31]  Masoud Mohammadian,et al.  Data classification process for security and privacy based on a fuzzy logic classifier , 2009 .

[32]  Masoud Mohammadian Classification of Data Based on a Fuzzy Logic System , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.

[33]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.