Regaining Lost Seconds: Efficient Page Preloading for SGX Enclaves

Intel SGX is already here, with a strong emphasis on security and privacy. However, it is not free. Studies have shown that it incurs a significant performance overhead to take advantage of the security and privacy enhancement offered by SGX. In particular, it only provides limited physical memory for applications to use SGX. As a result, page faults can be frequently triggered during program execution, especially for memory-intensive applications with a large memory footprint. Therefore, it is imperative to look into possible optimization opportunities to enhance the efficiency of SGX. To this end, this paper proposes to leverage memory page preloading techniques to mitigate such a problem. More specifically, we propose two effective schemes to preload memory pages before they are accessed. This way, the number of page faults can be significantly reduced. To demonstrate the effectiveness of the proposed schemes, we have implemented them in a prototype using LLVM and an untrusted operating system. Experimental results on benchmarks from SPEC CPU2017 and a micro-benchmark program show that, on average, these two mechanisms can achieve 11.4% and 7.0% performance improvement with a maximum performance improvement of 18.6% and 9.0%, respectively. The two mechanisms are also evaluated when they are deployed together. The combined approach can achieve an improvement of 7.1% on some real-world applications such as SIFT and MSER.

[1]  Donald E. Porter,et al.  Graphene-SGX: A Practical Library OS for Unmodified Applications on SGX , 2017, USENIX Annual Technical Conference.

[2]  Gadi Haber,et al.  Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm , 2008, HiPEAC.

[3]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Jóakim von Kistowski,et al.  SPEC CPU2017: Next-Generation Compute Benchmark , 2018, ICPE Companion.

[5]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[6]  Heiner Litz,et al.  Classifying Memory Access Patterns for Prefetching , 2020, ASPLOS.

[7]  Yingwei Luo,et al.  Optimal Cache Partition-Sharing , 2015, 2015 44th International Conference on Parallel Processing.

[8]  Mark Silberstein,et al.  Eleos: ExitLess OS Services for SGX Enclaves , 2017, EuroSys.

[9]  Srinivas Devadas,et al.  Intel SGX Explained , 2016, IACR Cryptol. ePrint Arch..

[10]  Guilherme Ottoni,et al.  HHVM JIT: a profile-guided, region-based compiler for PHP and Hack , 2018, PLDI.

[11]  Rajeev Balasubramonian,et al.  VAULT: Reducing Paging Overheads in SGX with Efficient Integrity Verification Structures , 2018, ASPLOS.

[12]  Baker Mohammad,et al.  Novel MSER-guided street extraction from satellite images , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[13]  Yalin Chen,et al.  Improved on an efficient user authentication scheme for heterogeneous wireless sensor network tailored for the Internet of Things environment , 2016, IACR Cryptol. ePrint Arch..

[14]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[15]  Jose Joao,et al.  Morphable Counters: Enabling Compact Integrity Trees For Low-Overhead Secure Memories , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Christof Fetzer,et al.  CoSMIX: A Compiler-based System for Secure Memory Instrumentation and Execution in Enclaves , 2019, USENIX ATC.

[17]  Gary McGraw,et al.  Exploiting Software: How to Break Code , 2004 .

[18]  Valerio Schiavoni,et al.  Everything You Should Know About Intel SGX Performance on Virtualized Systems , 2019, Proc. ACM Meas. Anal. Comput. Syst..

[19]  Pen-Chung Yew,et al.  Data Prefetching and Data Forwarding in Shared Memory Multiprocessors , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[20]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[21]  Jing Wang,et al.  Improved maximally stable extremal regions based method for the segmentation of ultrasonic liver images , 2015, Multimedia Tools and Applications.

[22]  Manolis Marazakis,et al.  Optimizing Memory-mapped I/O for Fast Storage Devices , 2020, USENIX Annual Technical Conference.

[23]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[24]  Y. Sa Medical Image Registration Algorithm Based on Compressive Sensing and Scale-Invariant Feature Transform , 2015, 2015 8th International Conference on Intelligent Computation Technology and Automation (ICICTA).

[25]  Hongsheng Xi,et al.  On the design of a new Linux readahead framework , 2008, OPSR.

[26]  James R. Larus,et al.  Optimally profiling and tracing programs , 1992, POPL '92.

[27]  David M. Eyers,et al.  SCONE: Secure Linux Containers with Intel SGX , 2016, OSDI.

[28]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[29]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[30]  Jesús Carlos Pedraza Ortega,et al.  Automatic segmentation of mammograms using a Scale-Invariant Feature Transform and K-means clustering algorithm , 2014, 2014 11th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE).

[31]  Todd M. Austin,et al.  Regaining lost cycles with HotCalls: A fast interface for SGX secure enclaves , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[32]  Christopher W. Fletcher,et al.  ZeroTrace : Oblivious Memory Primitives from Intel SGX , 2018, NDSS.