Efficient Algorithms with Asymmetric Read and Write Costs

In several emerging technologies for computer memory (main memory), the cost of reading is significantly cheaper than the cost of writing. Such asymmetry in memory costs poses a fundamentally different model from the RAM for algorithm design. In this paper we study lower and upper bounds for various problems under such asymmetric read and write costs. We consider both the case in which all but $O(1)$ memory has asymmetric cost, and the case of a small cache of symmetric memory. We model both cases using the $(M,\omega)$-ARAM, in which there is a small (symmetric) memory of size $M$ and a large unbounded (asymmetric) memory, both random access, and where reading from the large memory has unit cost, but writing has cost $\omega\gg 1$. For FFT and sorting networks we show a lower bound cost of $\Omega(\omega n\log_{\omega M} n)$, which indicates that it is not possible to achieve asymptotic improvements with cheaper reads when $\omega$ is bounded by a polynomial in $M$. Also, there is an asymptotic gap (of $\min(\omega,\log n)/\log(\omega M)$) between the cost of sorting networks and comparison sorting in the model. This contrasts with the RAM, and most other models. We also show a lower bound for computations on an $n\times n$ diamond DAG of $\Omega(\omega n^2/M)$ cost, which indicates no asymptotic improvement is achievable with fast reads. However, we show that for the edit distance problem (and related problems), which would seem to be a diamond DAG, there exists an algorithm with only $O(\omega n^2/(M\min(\omega^{1/3},M^{1/2})))$ cost. To achieve this we make use of a "path sketch" technique that is forbidden in a strict DAG computation. Finally, we show several interesting upper bounds for shortest path problems, minimum spanning trees, and other problems. A common theme in many of the upper bounds is to have redundant computation to tradeoff between reads and writes.

[1]  Sivan Toledo,et al.  Phase-change memory: An architectural perspective , 2013, CSUR.

[2]  Guy E. Blelloch,et al.  Parallel Algorithms for Asymmetric Read-Write Costs , 2016, SPAA.

[3]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[4]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[5]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1987, JACM.

[7]  Vijaya Ramachandran,et al.  Cache-oblivious dynamic programming , 2006, SODA '06.

[8]  Robert E. Tarjan,et al.  Updating a Balanced Search Tree in O(1) Rotations , 1983, Inf. Process. Lett..

[9]  Leslie G. Valiant,et al.  On Time Versus Space , 1977, JACM.

[10]  Suman Nath,et al.  Online maintenance of very large random samples on flash storage , 2009, The VLDB Journal.

[11]  Seung-Yun Lee,et al.  A Low Power Phase-Change Random Access Memory using a Data-Comparison Write Scheme , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[12]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[13]  E. Szemerédi,et al.  O(n LOG n) SORTING NETWORK. , 1983 .

[14]  Michael T. Goodrich,et al.  Zig-zag sort: a simple deterministic data-oblivious sorting algorithm running in O(n log n) time , 2014, STOC.

[15]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[16]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.

[17]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[18]  Guy E. Blelloch,et al.  Sorting with Asymmetric Read and Write Costs , 2015, SPAA.

[19]  Kenneth A. Ross,et al.  Path processing using Solid State Storage , 2012, ADMS@VLDB.

[20]  David Eppstein,et al.  Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket , 2014, SEA.

[21]  Sivan Toledo,et al.  Competitive analysis of flash memory algorithms , 2011, TALG.

[22]  Stephen A. Cook,et al.  Storage requirements for deterministic / polynomial time recognizable languages , 1974, STOC '74.

[23]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[24]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[25]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[26]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[27]  Stephen A. Cook,et al.  Storage Requirements for Deterministic Polynomial Time Recognizable Languages , 1976, J. Comput. Syst. Sci..

[28]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[29]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[30]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[31]  Philip N. Klein,et al.  Shortest paths in directed planar graphs with negative lengths: A linear-space O(n log2 n)-time algorithm , 2010, TALG.

[32]  Mike Paterson,et al.  Improved sorting networks withO(logN) depth , 1990, Algorithmica.

[33]  G. Muller,et al.  Emerging non-volatile memory technologies , 2003, ESSCIRC 2004 - 29th European Solid-State Circuits Conference (IEEE Cat. No.03EX705).

[34]  Stratis Viglas,et al.  Write-limited sorts and joins for persistent memory , 2014, Proc. VLDB Endow..

[35]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[36]  Jagan Singh Meena,et al.  Overview of emerging nonvolatile memory technologies , 2014, Nanoscale Research Letters.

[37]  Ulrich Meyer,et al.  On Computational Models for Flash Memory Devices , 2009, SEA.

[38]  Joel I. Seiferas,et al.  Sorting Networks of Logarithmic Depth, Further Simplified , 2009, Algorithmica.

[39]  Yuan Xie,et al.  PCRAMsim: System-level performance, energy, and area modeling for Phase-Change RAM , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[40]  Stratis Viglas,et al.  Adapting the B + -tree for Asymmetric I/O , 2012, ADBIS.

[41]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[42]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[43]  Sudhanva Gurumurthi,et al.  Phase Change Memory: From Devices to Systems , 2011, Phase Change Memory.

[44]  Cong Xu,et al.  Design implications of memristor-based RRAM cross-point structures , 2011, 2011 Design, Automation & Test in Europe.

[45]  Satish Rao,et al.  Planar graphs, negative weight edges, shortest paths, and near linear time , 2006, J. Comput. Syst. Sci..

[46]  Kyuseok Shim,et al.  FAST: Flash-aware external sorting for mobile database systems , 2009, J. Syst. Softw..

[47]  Jeanette P. Schmidt All shortest paths in weighted grid graphs and its application to finding all approximate repeats in strings , 1995, Proceedings Third Israel Symposium on the Theory of Computing and Systems.

[48]  James Demmel,et al.  Write-Avoiding Algorithms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[49]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[50]  L FredmanMichael,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1987 .

[51]  Wei-Che Tseng,et al.  Scheduling to Optimize Cache Utilization for Non-Volatile Main Memories , 2014, IEEE Transactions on Computers.

[52]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.