Towards meaningful distance-preserving encryption

Mining complex data is an essential and at the same time challenging task. Therefore, organizations pass on their encrypted data to service providers carrying out such analyses. Thus, encryption must preserve the mining results. Many mining algorithms are distance-based. Thus, we investigate how to preserve the results for such algorithms upon encryption. To this end, we propose the notion of distance-preserving encryption (DPE). This notion has just the right strictness - we show that we cannot relax it, using formal arguments as well as experiments. Designing a DPE scheme is challenging, as it depends both on the data set and the specific distance measure in use. We propose a procedure to engineer DPE-schemes, dubbed DisPE. In a case study, we instantiate DisPE for SQL query logs, a type of data containing valuable information about user interests. In this study, we design DPE schemes for all SQL query distance measures from the scientific literature. We formally show that one can use a combination of existing secure property-preserving encryption schemes to this end. Finally, we discuss on the generalizability of our findings using two other data sets as examples.

[1]  Charles V. Wright,et al.  Inference Attacks on Property-Preserving Encrypted Databases , 2015, CCS.

[2]  Gunter Saake,et al.  QuEval: Beyond high-dimensional indexing a la carte , 2013, Proc. VLDB Endow..

[3]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[4]  Nathan Chenette,et al.  Order-Preserving Encryption Revisited: Improved Security Analysis and Alternative Solutions , 2011, CRYPTO.

[5]  Diana Tri Susetianingtias,et al.  Query Rewriting and Corpus of Semantic Similarity as Encryption Method for Documents in Indonesian Language , 2016 .

[6]  Klemens Böhm,et al.  Distance-Based Data Mining over Encrypted Data , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[7]  Yannis Rouselakis,et al.  Property Preserving Symmetric Encryption , 2012, EUROCRYPT.

[8]  Donald Kossmann,et al.  Query Log Attack on Encrypted Databases , 2013, Secure Data Management.

[9]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[10]  Klemens Böhm,et al.  Cleaning Antipatterns in an SQL Query Log , 2018, IEEE Transactions on Knowledge and Data Engineering.

[11]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..

[12]  Caroline Fontaine,et al.  A Survey of Homomorphic Encryption for Nonspecialists , 2007, EURASIP J. Inf. Secur..

[13]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[14]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[15]  Neoklis Polyzotis,et al.  SQL QueRIE recommendations , 2010, Proc. VLDB Endow..

[16]  Nathan Chenette,et al.  Order-Preserving Symmetric Encryption , 2009, IACR Cryptol. ePrint Arch..

[17]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[18]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[19]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[20]  Klemens Böhm,et al.  Identifying User Interests within the Data Space - a Case Study with SkyServer , 2015, EDBT.

[21]  Rafail Ostrovsky,et al.  Public Key Encryption with Keyword Search , 2004, EUROCRYPT.

[22]  Fatos Xhafa,et al.  L-EncDB: A lightweight framework for privacy-preserving data queries in cloud computing , 2015, Knowl. Based Syst..

[23]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[24]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[25]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[26]  Mihir Bellare,et al.  A concrete security treatment of symmetric encryption , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[27]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[28]  Fevzi Alimo Methods of Combining Multiple Classiiers Based on Diierent Representations for Pen-based Handwritten Digit Recognition , 1996 .

[29]  Matteo Golfarelli,et al.  Mining Preferences from OLAP Query Logs for Proactive Personalization , 2011, ADBIS.

[30]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[31]  Nikos Mamoulis,et al.  Secure kNN computation on encrypted databases , 2009, SIGMOD Conference.

[32]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.