Write-Avoiding Algorithms
暂无分享,去创建一个
James Demmel | Laura Grigori | Oded Schwartz | Harsha Vardhan Simhadri | Nicholas Knight | Penporn Koanantakool | Erin C. Carson | J. Demmel | H. Simhadri | Penporn Koanantakool | L. Grigori | E. Carson | Nicholas Knight | O. Schwartz
[1] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[2] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[3] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[4] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[5] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[6] Robert E. Tarjan,et al. Amortized efficiency of list update and paging rules , 1985, CACM.
[7] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[8] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[9] Ramesh C. Agarwal,et al. A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..
[10] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[11] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[12] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[13] Marc Snir,et al. GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .
[14] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[15] Sivan Toledo,et al. Algorithms and data structures for flash memories , 2005, CSUR.
[16] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[17] Sivan Toledo,et al. Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design , 2008, WEA.
[18] Vijayalakshmi Srinivasan,et al. Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.
[20] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[21] K. Gopalakrishnan,et al. Phase change memory technology , 2010, 1001.1164.
[22] Guy E. Blelloch,et al. Low depth cache-oblivious algorithms , 2010, SPAA '10.
[23] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[24] Sivan Toledo,et al. Competitive analysis of flash memory algorithms , 2011, TALG.
[25] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[26] Paolo Mattavelli,et al. A 4 Mb LV MOS-Selected Embedded Phase Change Memory in 90 nm Standard CMOS Technology , 2011, IEEE Journal of Solid-State Circuits.
[27] Rajesh K. Gupta,et al. NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.
[28] David Eklov,et al. Cache Pirating: Measuring the Curse of the Shared Cache , 2011, 2011 International Conference on Parallel Processing.
[29] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[30] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[31] Dong Li,et al. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[32] James Demmel,et al. Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication , 2012, MedAlg.
[33] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[34] Amin Vahdat,et al. Themis: an I/O-efficient MapReduce , 2012, SoCC '12.
[35] Guy E. Blelloch,et al. Cache and I/O efficent functional algorithms , 2013, POPL.
[36] Graph expansion and communication costs of fast matrix multiplication , 2012, JACM.
[37] Katherine A. Yelick,et al. A Communication-Optimal N-Body Algorithm for Direct Interactions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[38] Benton Calhoun,et al. A 0.6V 8 pJ/write non-volatile CBRAM macro embedded in a body sensor node for ultra low energy applications , 2013, 2013 Symposium on VLSI Circuits.
[39] James Demmel,et al. Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..
[40] James Demmel,et al. Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.
[41] Samuel H. Fuller,et al. The Future of Computing Performance: Game Over or Next Level? , 2014 .
[42] Katherine A. Yelick,et al. A Computation- and Communication-Optimal Parallel Direct 3-Body Algorithm , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[43] Wei-Che Tseng,et al. Scheduling to Optimize Cache Utilization for Non-Volatile Main Memories , 2014, IEEE Transactions on Computers.
[44] P. Mueller,et al. PSS : A prototype storage subsystem based on PCM , 2014 .
[45] James Demmel,et al. Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.
[46] Krste Asanovic,et al. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .
[47] Jin Xiong,et al. A Survey of Phase Change Memory Systems , 2015, Journal of Computer Science and Technology.
[48] Guy E. Blelloch,et al. Sorting with Asymmetric Read and Write Costs , 2015, SPAA.