Privacy-aware Ranking with Tree Ensembles on the Cloud

Tree-based ensembles are widely used for document ranking but supporting such a method efficiently under a privacy-preserving constraint on the cloud is an open research problem. The main challenge is that letting the cloud server perform ranking computation may unsafely reveal privacy-sensitive information. To address privacy with tree-based server-side ranking, this paper proposes to reduce the learning-to-rank model dependence on composite features as a trade-off, and develops comparison-preserving mapping to hide feature values and tree thresholds. To justify the above approach, the presented analysis shows that a decision tree with simplifiable composite features can be transformed into another tree using raw features without increasing the training accuracy loss. This paper analyzes the privacy properties of the proposed scheme, and compares the relevance of gradient boosting regression trees, LambdaMART, and random forests using raw features for several test data sets under the privacy consideration, and assesses the competitiveness of a hybrid model based on these algorithms.

[1]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[2]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[3]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[4]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[5]  Xiangji Huang,et al.  An enhanced context-sensitive proximity model for probabilistic information retrieval , 2014, SIGIR.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Michael Naehrig,et al.  Privately Evaluating Decision Trees and Random Forests , 2016, IACR Cryptol. ePrint Arch..

[8]  Charalampos Papamanthou,et al.  Dynamic searchable symmetric encryption , 2012, IACR Cryptol. ePrint Arch..

[9]  Murat Kantarcioglu,et al.  Access Pattern disclosure on Searchable Encryption: Ramification, Attack and Mitigation , 2012, NDSS.

[10]  Ming Li,et al.  Verifiable Privacy-Preserving Multi-Keyword Text Search in the Cloud Supporting Similarity-Based Ranking , 2013, IEEE Transactions on Parallel and Distributed Systems.

[11]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[12]  Cong Wang,et al.  Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data , 2012, IEEE Transactions on Parallel and Distributed Systems.

[13]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[14]  Qian Wang,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[15]  David Cash,et al.  Leakage-Abuse Attacks Against Searchable Encryption , 2015, IACR Cryptol. ePrint Arch..

[16]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  Erkay Savas,et al.  An efficient privacy-preserving multi-keyword search over encrypted cloud data with ranking , 2014, Distributed and Parallel Databases.

[19]  Nathan Chenette,et al.  Order-Preserving Encryption Revisited: Improved Security Analysis and Alternative Solutions , 2011, CRYPTO.

[20]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[21]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Shai Halevi,et al.  Bootstrapping for HElib , 2015, EUROCRYPT.

[24]  Muhammad Ibrahim,et al.  Comparing Pointwise and Listwise Objective Functions for Random-Forest-Based Learning-to-Rank , 2016, ACM Trans. Inf. Syst..

[25]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[26]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[27]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[28]  Hugo Krawczyk,et al.  Dynamic Searchable Encryption in Very-Large Databases: Data Structures and Implementation , 2014, NDSS.

[29]  ChengXiang Zhai,et al.  Lower-bounding term frequency normalization , 2011, CIKM '11.

[30]  Muhammad Ibrahim,et al.  Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval , 2017, SIGIR Forum.

[31]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  N. Cao,et al.  Privacy-preserving multi-keyword ranked search over encrypted cloud data , 2011, 2011 Proceedings IEEE INFOCOM.

[34]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[35]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[36]  Charles V. Wright,et al.  The Shadow Nemesis: Inference Attacks on Efficiently Deployable, Efficiently Searchable Encryption , 2016, CCS.

[37]  Ming Li,et al.  Verifiable Privacy-Preserving Multi-Keyword Text Search in the Cloud Supporting Similarity-Based Ranking , 2014, IEEE Trans. Parallel Distributed Syst..

[38]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Nickolai Zeldovich,et al.  An Ideal-Security Protocol for Order-Preserving Encoding , 2013, 2013 IEEE Symposium on Security and Privacy.

[41]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[42]  Seny Kamara,et al.  Boolean Searchable Symmetric Encryption with Worst-Case Sub-linear Complexity , 2017, EUROCRYPT.

[43]  Leonid Boytsov,et al.  Evaluating Learning-to-Rank Methods in the Web Track Adhoc Task , 2011, TREC.

[44]  Charalampos Papamanthou,et al.  Parallel and Dynamic Searchable Symmetric Encryption , 2013, Financial Cryptography.

[45]  Hugo Krawczyk,et al.  Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries , 2013, IACR Cryptol. ePrint Arch..

[46]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[47]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[48]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[49]  Tao Yang,et al.  Privacy and Efficiency Tradeoffs for Multiword Top K Search with Linear Additive Rank Scoring , 2018, WWW.

[50]  Xin Li,et al.  Investigation of partial query proximity in web search , 2008, WWW.

[51]  I. Stoica,et al.  Privacy Preserving Ranked Multi-Keyword Search for Multiple Data Owners in Cloud Computing , 2016, IEEE Transactions on Computers.

[52]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[53]  David Cash,et al.  The Locality of Searchable Symmetric Encryption , 2014, IACR Cryptol. ePrint Arch..

[54]  W. Bruce Croft,et al.  Relevance-based Word Embedding , 2017, SIGIR.

[55]  Tao Li,et al.  Differentially private classification with decision tree ensemble , 2018, Appl. Soft Comput..

[56]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[57]  Melissa Chase,et al.  Structured Encryption and Controlled Disclosure , 2010, IACR Cryptol. ePrint Arch..

[58]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[59]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[60]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[61]  Jimmy J. Lin,et al.  UMD and USC/ISI: TREC 2010 Web Track Experiments with Ivory , 2010, TREC.

[62]  Michael Mitzenmacher,et al.  Privacy Preserving Keyword Searches on Remote Encrypted Data , 2005, ACNS.

[63]  Rafail Ostrovsky,et al.  Searchable symmetric encryption: improved definitions and efficient constructions , 2006, CCS '06.