Confidentiality Based File Attributes and Data Classification Using TsF-KNN

Machine Leaning (ML) plays an important role in the electronic data management. It is always costly and difficult to manage the data manually without adopting ML or with ML using metadata. Many ML algorithms have been proposed to solve different data management issues, but the prediction of the confidential data and non- confidential data in a data file is still a challenging research gap. A file cannot be categorized into a single category/class because the data in one simply file may fall into different categories/classes. The main objective of this study is to predict the confidential and non-confidential data of a file using K-NN algorithm. We also proposed a method called Training dataset Filtration Key Nearest Neighbour (TsF-KNN) classifier which classifies the data of file based on the confidentiality level of the schema of a file (file attributes). The proposed algorithm, TsF-KNN, is efficient in the context of time and has a higher accuracy as compared to the traditional K-NN algorithms.