Sorting with Asymmetric Read and Write Costs

Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus on write-efficient sorting algorithms. First, we consider the PRAM model with asymmetric write cost, and show that sorting can be performed in O(n) writes, O(n log n) reads, and logarithmic depth (parallel time). Next, we consider a variant of the External Memory (EM) model that charges k > 1 for writing a block of size B to the secondary memory, and present variants of three EM sorting algorithms (multi-way merge sort, sample sort, and heap sort using buffer trees) that asymptotically reduce the number of writes over the original algorithms, and perform roughly k block reads for every block write. Finally, we define a variant of the Ideal-Cache model with asymmetric write costs, and present write-efficient,cache-oblivious parallel algorithms for sorting, FFTs, and matrix multiplication. Adapting prior bounds for work-stealing and parallel-depth-first schedulers to the asymmetric setting, these yield provably good bounds for parallel machines with private caches or with a shared cache, respectively.

[1]  Seung-Yun Lee,et al.  A Low Power Phase-Change Random Access Memory using a Data-Comparison Write Scheme , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[2]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[3]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[4]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[5]  Sudhanva Gurumurthi,et al.  Phase Change Memory: From Devices to Systems , 2011, Phase Change Memory.

[6]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[7]  G. Muller,et al.  Emerging non-volatile memory technologies , 2003, ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705).

[8]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[9]  Guy E. Blelloch,et al.  Effectively sharing a cache among threads , 2004, SPAA '04.

[10]  Guy E. Blelloch,et al.  The Data Locality of Work Stealing , 2002, SPAA '00.

[11]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[12]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[13]  Cong Xu,et al.  Design implications of memristor-based RRAM cross-point structures , 2011, 2011 Design, Automation & Test in Europe.

[14]  Yuan Xie,et al.  PCRAMsim: System-level performance, energy, and area modeling for Phase-Change RAM , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[15]  Derick Wood,et al.  How to Update a Balanced Binary Tree with a Constant Number of Rotations , 1990, SWAT.

[16]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[17]  David Eppstein,et al.  Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket , 2014, SEA.

[18]  Sivan Toledo,et al.  Competitive analysis of flash memory algorithms , 2011, TALG.

[19]  Kyuseok Shim,et al.  FAST: Flash-aware external sorting for mobile database systems , 2009, J. Syst. Softw..

[20]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.

[21]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[22]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[23]  Jagan Singh Meena,et al.  Overview of emerging nonvolatile memory technologies , 2014, Nanoscale Research Letters.

[24]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[25]  John H. Reif,et al.  Parallel Computational Geometry: An Approach using Randomization , 2000, Handbook of Computational Geometry.

[26]  Michael T. Goodrich,et al.  Fundamental parallel algorithms for private-cache chip multiprocessors , 2008, SPAA '08.

[27]  Kenneth A. Ross,et al.  Path processing using Solid State Storage , 2012, ADMS@VLDB.

[28]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[29]  Stratis Viglas,et al.  Write-limited sorts and joins for persistent memory , 2014, Proc. VLDB Endow..

[30]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[32]  Norbert Zeh,et al.  A parallel buffer tree , 2012, SPAA '12.

[33]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[34]  Guy E. Blelloch,et al.  Low depth cache-oblivious algorithms , 2010, SPAA '10.

[35]  Sivan Toledo,et al.  Phase-change memory: An architectural perspective , 2013, CSUR.

[36]  Suman Nath,et al.  Online maintenance of very large random samples on flash storage , 2009, The VLDB Journal.

[37]  Matteo Frigo,et al.  An analysis of dag-consistent distributed shared-memory algorithms , 1996, SPAA '96.

[38]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1980, Acta Informatica.

[39]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[40]  Stratis Viglas,et al.  Adapting the B + -tree for Asymmetric I/O , 2012, ADBIS.

[41]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.