Cache-Friendly implementations of transitive closure
暂无分享,去创建一个
[1] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[2] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[3] Viktor K. Prasanna,et al. Dynamic data layouts for cache-conscious factorization of DFT , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[4] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[5] Yves Robert,et al. Proceedings of the international workshop on Parallel algorithms & architectures , 1986 .
[6] Jaewook Shin,et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[7] Guang R. Gao,et al. Heap analysis and optimizations for threaded programs , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[8] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[9] Ellis Horowitz,et al. Fundamentals of Computer Algorithms , 1978 .
[10] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[11] Denis Trystram,et al. Parallel algorithms and architectures , 1995 .
[12] Erik R. Altman,et al. Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques , 2006, PACT 2006.
[13] Ali R. Hurson,et al. Effects of Multithreading on Cache Performance , 1999, IEEE Trans. Computers.
[14] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[15] Rakesh M. Verma,et al. Tight Bounds for Prefetching and Buffer Management Algorithms for Parallel I/O Systems , 1996, FSTTCS.
[16] Siddhartha Chatterjee,et al. Cache-efficient matrix transposition , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[17] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[18] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[19] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[20] Yves Robert,et al. Loop partitioning versus tiling for cache-based multiprocessors , 1998 .
[21] Viktor K. Prasanna,et al. Cache-friendly implementations of transitive closure , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[22] Jeffrey D Ullma. Computational Aspects of VLSI , 1984 .
[23] Sandeep Sen,et al. Towards a theory of cache-efficient algorithms , 2000, SODA '00.
[24] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .
[25] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[26] David A. Patterson,et al. Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .
[27] Peter J. Varman,et al. Optimal prefetching and caching for parallel I/O sytems , 2001, SPAA '01.
[28] Sally A. McKee,et al. Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).