Can We Securely Outsource Big Data Analytics with Lightweight Cryptography?

Advances in cryptography such as secure multiparty computation (SMC) and fully-/somewhat-homomorphic encryption (FHE/SHE) have already provided a generic solution to the problem of processing encrypted data; however, they are still not that efficient if one directly applies them for big data analytics. Many cryptographers have recently designed specialized privacy-preserving frameworks for neural networks. While promising, they are still not entirely satisfactory. Gazelle (Usenix Security 2018) supports inference but not training. SecureNN (PoPETS 2019), with the help of non-colluding servers, is still orders of magnitudes slower than plaintext training/inferencing. To narrow the gap between theory and practice, we put forward a new paradigm for privacy-preserving big data analytics which leverages both trusted processor such as Intel SGX (Software Guard Extensions) and (untrusted) GPU (Graphics Processing Unit). Note that SGX is not a silver bullet in this scenario. In general, SGX is subject to a memory constraint which can be easily exceeded by a single layer of the (evergrowing) neural networks. Relying on the generic solution such as paging mechanism is, again, inefficient. GPU is an ideal platform for deep learning, yet, we do not want to assume it is trusted. We thus still need cryptographic techniques. In this keynote, we will briefly survey the research landscape of privacy-preserving machine learning, point out the obstacles brought by seemingly slight changes of requirements (e.g., a single query from different data sources, multiple model owners, outsourcing a trained model to an untrusted cloud), and highlight a number of settings which aids in ensuring privacy without heavyweight cryptography. We will also discuss two notable recent works, Graviton (OSDI 2018) and Slalom (ICLR 2019), and our ongoing research.