The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation
暂无分享,去创建一个
[1] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[2] Alexandru Nicolau,et al. R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks , 2007, Algorithmica.
[3] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.
[4] R. Ladner,et al. Cache efficient simple dynamic programming , 2005 .
[5] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[6] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[7] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[8] Richard E. Ladner,et al. Algorithms to Take Advantage of Hardware Prefetching , 2007, ALENEX.
[9] Matteo Frigo,et al. An analysis of dag-consistent distributed shared-memory algorithms , 1996, SPAA '96.
[10] Volker Strumpen,et al. The Cache Complexity of Multithreaded Cache Oblivious Algorithms , 2009, SPAA '06.
[11] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[12] Kenneth E. Iverson,et al. A programming language , 1899, AIEE-IRE '62 (Spring).
[13] Vijaya Ramachandran,et al. Cache-efficient dynamic programming algorithms for multicores , 2008, SPAA '08.
[14] Vijaya Ramachandran,et al. The cache-oblivious gaussian elimination paradigm: theoretical framework and experimental evaluation , 2006, SPAA '06.
[15] Stephen Warshall,et al. A Theorem on Boolean Matrices , 1962, JACM.
[16] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.
[17] Vijaya Ramachandran,et al. Cache-oblivious dynamic programming , 2006, SODA '06.
[18] David S. Greenberg,et al. Beyond core: Making parallel computer I/O practical , 1993 .
[19] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .
[20] Guy E. Blelloch,et al. Effectively sharing a cache among threads , 2004, SPAA '04.
[21] Donald E. Knuth. Two notes on notation , 1992 .
[22] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[23] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[24] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[25] Josef Weidendorfer,et al. Valgrind 3.3 - Advanced Debugging and Profiling for Gnu/Linux Applications , 2008 .
[26] Roman Dementiev,et al. STXXL: standard template library for XXL data sets , 2008 .
[27] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[28] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[29] Keshav Pingali,et al. An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.
[30] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[31] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[32] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.