Side-Channel Attacks on Shared Search Indexes

Full-text search systems, such as Elasticsearch and Apache Solr, enable document retrieval based on keyword queries. In many deployments these systems are multi-tenant, meaning distinct users' documents reside in, and their queries are answered by, one or more shared search indexes. Large deployments may use hundreds of indexes across which user documents are randomly assigned. The results of a search query are filtered to remove documents to which a client should not have access. We show the existence of exploitable side channels in modern multi-tenant search. The starting point for our attacks is a decade-old observation that the TF-IDF scores used to rank search results can potentially leak information about other users' documents. To the best of our knowledge, no attacks have been shown that exploit this side channel in practice, and constructing a working side channel requires overcoming numerous challenges in real deployments. We nevertheless develop a new attack, called STRESS (Search Text RElevance Score Side channel), and in so doing show how an attacker can map out the number of indexes used by a service, obtain placement of a document within each index, and then exploit co-tenancy with all other users to (1) discover the terms in other tenants' documents or (2) determine the number of documents (belonging to other tenants) that contain a term of interest. In controlled experiments, we demonstrate the attacks on popular services such as GitHub and Xen.do. We conclude with a discussion of countermeasures.

[1]  Wolfgang Nejdl,et al.  Zerber+R: top-k retrieval from a confidential index , 2009, EDBT '09.

[2]  Colin Percival CACHE MISSING FOR FUN AND PROFIT , 2005 .

[3]  Michael K. Reiter,et al.  Cross-Tenant Side-Channel Attacks in PaaS Clouds , 2014, CCS.

[4]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[5]  Matti A. Hiltunen,et al.  An exploration of L2 cache covert channels in virtualized environments , 2011, CCSW '11.

[6]  Giorgos Margaritis,et al.  Efficient Multi-User Indexing for Secure Keyword Search , 2014, EDBT/ICDT Workshops.

[7]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[8]  Mudhakar Srivatsa,et al.  Efficient and Secure Search of Enterprise File Systems , 2007, IEEE International Conference on Web Services (ICWS 2007).

[9]  Amir Herzberg,et al.  Cross-Site Search Attacks , 2015, CCS.

[10]  Hovav Shacham,et al.  Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds , 2009, CCS.

[11]  Michael M. Swift,et al.  A Placement Vulnerability Study in Multi-Tenant Public Clouds , 2015, USENIX Security Symposium.

[12]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[13]  Charles L. A. Clarke,et al.  A security model for full-text file system search in multi-user environments , 2005, FAST'05.

[14]  Ariel Waissbein,et al.  The ND2DB Attack: Database Content Extraction Using Timing Attacks on the Indexing Algorithms , 2007, WOOT.

[15]  Adi Shamir,et al.  Cache Attacks and Countermeasures: The Case of AES , 2006, CT-RSA.

[16]  Gorka Irazoqui Apecechea,et al.  Seriously, get off my cloud! Cross-VM RSA Key Recovery in a Public Cloud , 2015, IACR Cryptol. ePrint Arch..

[17]  Darrell D. E. Long,et al.  Security Aware Partitioning for efficient file system search , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Zhenyu Wu,et al.  Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud , 2012, USENIX Security Symposium.

[20]  Joseph K. Blitzstein,et al.  Introduction to Probability , 2014 .