Data Oblivious Algorithms for Multicores

As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture parallelism. However, it is widely understood that PRAM does not best capture realistic multicore processors, nor does it reflect parallel programming models adopted in practice. In this paper, we initiate the study of parallel data oblivious algorithms on realistic multicores, best captured by the binary fork-join model of computation. We first show that data-oblivious sorting can be accomplished by a binary fork-join algorithm with optimal total work and optimal (cache-oblivious) cache complexity, and in O(log n log log n) span (i.e., parallel time) that matches the best-known insecure algorithm. Using our sorting algorithm as a core primitive, we show how to data-obliviously simulate general PRAM algorithms in the binary fork-join model with non-trivial efficiency. We also present results for several applications including list ranking, Euler tour, tree contraction, connected components, and minimum spanning forest. For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms. For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms. Complementing these asymptotically efficient results, we present a practical variant of our sorting algorithm that is self-contained and potentially implementable. It has optimal caching cost, and it is only a log log n factor off from optimal work and about a log n factor off in terms of span; moreover, it achieves small constant factors in its bounds.

[1]  Guy E. Blelloch,et al.  Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[3]  Elaine Shi,et al.  Oblivious RAM with O((logN)3) Worst-Case Cost , 2011, ASIACRYPT.

[4]  Kartik Nayak,et al.  OptORAMa: Optimal Oblivious RAM , 2020, IACR Cryptol. ePrint Arch..

[5]  Richard Cole,et al.  Efficient Resource Oblivious Algorithms for Multicores with False Sharing , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[6]  Guy E. Blelloch,et al.  Optimal Parallel Algorithms in the Binary-Forking Model , 2019, SPAA.

[7]  Edward F. Grove,et al.  External-memory graph algorithms , 1995, SODA '95.

[8]  Richard Cole,et al.  Analysis of Randomized Work Stealing with False Sharing , 2011, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[9]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[10]  Guy E. Blelloch,et al.  Effectively sharing a cache among threads , 2004, SPAA '04.

[11]  Richard Cole,et al.  Revisiting the Cache Miss Analysis of Multithreaded Algorithms , 2012, LATIN.

[12]  Seth Pettie,et al.  A Randomized Time-Work Optimal Parallel Algorithm for Finding a Minimum Spanning Forest , 1999, RANDOM-APPROX.

[13]  Guy E. Blelloch,et al.  Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.

[14]  Elaine Shi,et al.  Circuit OPRAM: Unifying Statistically and Computationally Secure ORAMs and OPRAMs , 2017, TCC.

[15]  Richard Cole,et al.  Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers: Extended Abstract , 2017, SPAA.

[16]  Marcus Peinado,et al.  Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems , 2015, 2015 IEEE Symposium on Security and Privacy.

[17]  Charles E. Leiserson,et al.  Space-efficient scheduling of multithreaded computations , 1993, SIAM J. Comput..

[18]  E BlellochGuy,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1999 .

[19]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[20]  Murat Kantarcioglu,et al.  Access Pattern disclosure on Searchable Encryption: Ramification, Attack and Mitigation , 2012, NDSS.

[21]  Elaine Shi,et al.  Oblivious Parallel Tight Compaction , 2020, IACR Cryptol. ePrint Arch..

[22]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[23]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.

[24]  Oded Goldreich,et al.  Towards a theory of software protection and simulation by oblivious RAMs , 1987, STOC.

[25]  Richard Cole,et al.  Resource Oblivious Sorting on Multicores , 2010, ICALP.

[26]  Volker Strumpen,et al.  The Cache Complexity of Multithreaded Cache Oblivious Algorithms , 2009, SPAA '06.

[27]  Kai-Min Chung,et al.  Oblivious Parallel RAM , 2014, IACR Cryptol. ePrint Arch..

[28]  Vijaya Ramachandran,et al.  Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.

[29]  Elaine Shi,et al.  Cache-Oblivious and Data-Oblivious Sorting and Applications , 2018, SODA.

[30]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[31]  Elaine Shi,et al.  On the Depth of Oblivious Parallel RAM , 2017, ASIACRYPT.

[32]  Elaine Shi,et al.  Optimal Oblivious Parallel RAM , 2022, IACR Cryptol. ePrint Arch..

[33]  Vivek Sarkar,et al.  The design and implementation of the habanero-java parallel programming language , 2011, OOPSLA Companion.

[34]  Marina Blanton,et al.  Data-oblivious graph algorithms for secure computation and outsourcing , 2013, ASIA CCS '13.

[35]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[36]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[37]  János Komlós,et al.  An 0(n log n) sorting network , 1983, STOC.

[38]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[39]  E. Szemerédi,et al.  O(n LOG n) SORTING NETWORK. , 1983 .

[40]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[41]  Arthur L. Delcher,et al.  Optimal Parallel Evaluation of Tree-Structured Computations by Raking , 1988, AWOC.

[42]  Kartik Nayak,et al.  Bucket Oblivious Sort: An Extremely Simple Oblivious Sort , 2020, SOSA.

[43]  Vijaya Ramachandran,et al.  Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[44]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.

[45]  Michael T. Goodrich,et al.  Graph Drawing in the Cloud: Privately Visualizing Relational Data Using Small Working Storage , 2012, GD.

[46]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[47]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[48]  Michael T. Goodrich,et al.  Data-Oblivious Graph Algorithms in Outsourced External Memory , 2014, COCOA.

[49]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[50]  Guy E. Blelloch,et al.  The Data Locality of Work Stealing , 2002, SPAA '00.