QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
暂无分享,去创建一个
[1] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[2] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[3] J. Davenport. Editor , 1960 .
[4] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[5] Mithuna Thottethodi,et al. Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..
[6] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[7] Martin C. Rinard,et al. Recursion Unrolling for Divide and Conquer Programs , 2000, LCPC.
[8] Jeremy D. Frens,et al. Matrix factorization using a block-recursive structure and block-recursive algorithms , 2002 .
[9] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[10] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[11] Gene H. Golub,et al. Matrix computations , 1983 .
[12] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[13] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[14] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[15] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[16] John Darlington,et al. A Transformation System for Developing Recursive Programs , 1977, J. ACM.
[17] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[18] Ivan Dimov,et al. Advances in Parallel Algorithms , 1994 .
[19] Fumihiko Ino,et al. LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.
[20] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[21] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[22] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[23] David S. Wise. Undulant-Block Elimination and Integer-Preserving Matrix Inversion , 1999, Sci. Comput. Program..
[24] Albert Y. Zomaya. Parallel and Distributed Computing Handbook , 1995 .
[25] Donald E. Knuth,et al. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .
[26] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[27] Tom Axford,et al. The divide-and-conquer paradigm as a basis for parallel language design , 1992 .
[28] Donald E. Knuth. The art of computer programming: fundamental algorithms , 1969 .
[29] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.