A classification method for identifying confidential data to enhance efficiency of query processing over cloud

With the increased use of Database-as-a-Service (DAAS), several issues also come in parallel, especially in translating and executing queries to and from the database securely and efficiently. These issues are in response towards potential attacks such as attempting to copy or eavesdrop the database via queries. Existing security mechanisms include securing the queries by using encryption. However, encrypting the queries significantly affects the efficiency of query processing because of the security overhead from the encrypting and decrypting processes. This study aims to address this problem by proposing a divide-andconquer strategy in which partial encryptions is used on the queries. This is performed by classifying the data into sensitive and non-sensitive categories using a classification approach, so that only the sensitive data will be encrypted. The classification used in this study is based on the data classification policy from the Columbia University. Firstly, a manual annotation is conducted to label the data fields into sensitive and non-sensitive categories. Next, rules are generated in order to classify the queried data. If a query contains sensitive data, the encryption will specifically be applied to the sensitive data, whereas the non-sensitive data will remain unencrypted. Experiments have been conducted using real-time data from Baghdad University that is related to students’ information consisting 35 tables and 362 fields. The evaluation is based on the comparison of security overhead of the fully encryption (without classification) and partial encryption (with the classification) using Advance Encryption Standard (AES). Results shown that the classification method has significantly reduced the time used to process the query. This implies that the partial encryption based on classifying the data into sensitive and non-sensitive categories has improves the efficiency of query processing.

[1]  Tim Kraska,et al.  Cloudy , 2010, Proc. VLDB Endow..

[2]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[3]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[4]  Georgios Paliouras,et al.  Using Machine Learning to Maintain Rule-based Named-Entity Recognition and Classification Systems , 2001, ACL.

[5]  Krishna P. Gummadi,et al.  Towards Trusted Cloud Computing , 2009, HotCloud.

[6]  Joan Daemen,et al.  AES Proposal : Rijndael , 1998 .

[7]  Nigel Ellis,et al.  Extreme scale with full SQL language support in microsoft SQL Azure , 2010, SIGMOD Conference.

[8]  Keke Chen,et al.  Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation , 2012, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[10]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[11]  Wei Wang,et al.  Keyword-based search and exploration on databases , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[15]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[16]  Yongjun Ren,et al.  Designated-Verifier Provable Data Possession in Public Cloud Storage , 2013 .

[17]  Cong Wang,et al.  Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data , 2012, IEEE Transactions on Parallel and Distributed Systems.

[18]  Low Tang Jung,et al.  Hybrid Multi-cloud Data Security (HMCDS) Model and Data Classification , 2013, 2013 International Conference on Advanced Computer Science Applications and Technologies.